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Make universal 
health care a priority 


World leaders and international donors must 
help to strengthen the health systems of the 
most vulnerable nations. 


s the 2019 novel coronavirus continues its 
deadly rampage, the World Health Organiza- 
tion (WHO) is rightly drawing attention to the 
risks the virus poses to the poorest and most 
vulnerable nations — particularly in Africa. 

As Nature went to press, more than 43,000 infections 
and more than 1,000 deaths had been confirmed. Soon, 
thousands of China’s citizens will be returning to their jobs 
onthe African continent after an extended new-year holi- 
day. If the virus also reaches Africa, it could spread rapidly 
and undetected because health systems in many regions 
are too fragile and underfunded to cope. 

As aresult, the WHO has scrambled to equip 14 coun- 
tries — including the Democratic Republic of the Congo, 
Ethiopia and Nigeria — with diagnostics, expertise and 
equipment to detect and contain the virus. The agency 
has also appealed for US$675 million to assist vulnerable 
countries — an amount that it estimates will last only until 
the end of April. 

And yet, as donors start to provide emergency aid — the 
Bill & Melinda Gates Foundation was among the first with 
a $100-million pledge — it’s hard to avoid the feeling of 
déja vu. Infectious-disease outbreaks are often accompa- 
nied by such pledges to improve disease surveillance, and 
by promises to provide funds for drug and vaccine devel- 
opment. What is less forthcoming is sustainable funding 
for clinics providing community-level general medicine, 
and for medical and nursing education, as well as invest- 
ments to sustain hospitals with supplies, electricity and 
running water. 

These are all steps that would help countries to combat 
infectious diseases and improve overall public health — 
as WHO director-general Tedros Adhanom Ghebreyesus 
urged ina statement at the end of last month. Seven of the 
nations that the WHO will be helping scarcely have one 
nurse per 1,000 people, according to the most recent 
statistics from the World Bank. And more than 50% of the 
continent’s 1.2 billion inhabitants lack access to essential 
primary care. 

To be fair, a shift in outlook has already begun. In 2016, 
the World Bank and the Global Fund to Fight AIDS, Tuber- 
culosis and Malaria committed $24 billion over three to 
five years for universal health care in Africa. And Rwanda’s 
president, Paul Kagame, is leading an African Union task 
force to achieve measurable universal health coverage in all 
of its55 member states, partly by committing to spending 
5% of gross domestic product on health care. 
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Atemporary surge of assistance aimed at infectious- 
disease surveillance — as is happening now — might suffice 
in places where health systems are reasonably robust. But 
for the poorest countries with the weakest systems, even 
the best projects will struggle once these grants come to 
an end, as the case of Ebola shows all too well. 

After the world’s biggest Ebola outbreak ended in 2016, 
donors, including the US government and the World Bank, 
put more than $100 million into initiatives to strengthen 
health and disease-surveillance systems in the three coun- 
tries that were worst hit — Liberia, Sierra Leone and Guinea. 

But many of these initiatives are ending, and health 
care is showing signs of erosion. Since last summer, pro- 
tests have been erupting in Liberia as the economy and 
the national health system have crumbled. Major hos- 
pitals are reported to lack life-saving drugs, and health 
workers and lab technicians say they have not been paid 
for months. Patients have been turned away from clinics 
empty-handed. This problem isn’t specific to Liberia. In 
many of the poorest countries, staff in national health 
systems barely earn a living. 

International donors have reasons for not providing 
long-term funding for salaries for public employees. One 
of their biggest fears is that in doing so they would become 
too deeply involved in the workings of government depart- 
ments, which are often complicated organizations to 
navigate. Another worry is that donors could be perceived 
as telling sovereign governments what to do. 

Clearly, finding solutions to these problems will not be 
easy, but donors must consider how their initiatives can 
help to strengthen national health systems for the long 
term. For example, they could ensure that the health work- 
ers being trained to handle patients suspected of having 
coronavirus are still employed at hospitals five years later. 
This might not seem like a priority in the middle of an 
emergency, but it will pay off handsomely down the line. 

The march of the coronavirus reminds us yet again that 
world leaders and philanthropic donors pay attention to 
epidemics only when an infection is on their doorsteps. 
They must recognize that the time to think about the next 
epidemic is now. 


When it’s fine to fail 


The history of metrology holds valuable 
lessons for initiatives to reproduce results. 


veryone’s talking about reproducibility — or 

at least they are in the biomedical and social 

sciences. The past decade has seen a growing 

recognition that results must be independently 
replicated before they can be accepted as true. 

A focus on reproducibility is necessary in the physical 

sciences, too—anissue explored inthis month’s Nature Phys- 

ics, in which two metrologists argue that reproducibility 
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should be viewed through a different lens. When results in 
the science of measurement cannot be reproduced, argue 
Martin Milton and Antonio Possolo, it’s a sign of the sci- 
entific method at work — and an opportunity to promote 
public awareness of the research process (M. J. T. Milton 
and A. Possolo Nature Phys. 26, 117-119; 2020). 

The authors — at the International Bureau of Weights 
and Measures in Paris, and at the National Institutes of 
Standards and Technology in Gaithersburg, Maryland, 
respectively — draw on three case studies, each one an 
instalment in the quest to measure one of the fundamen- 
tal constants of nature. 

The researchers chose the speed of light (c); Planck’s 
constant (A), anumber that links the amount of energy a 
photon carries to its frequency; and the constant of grav- 
itation (G), a measure of the strength of the gravitational 
force between two bodies. 

For both Planck’s constant and the speed of light, dif- 
ferent laboratories have arrived at the same number using 
different methods — a sign of reproducibility. In the case 
of Planck’s constant, there’s now enough confidence in its 
value for itto become the basis of the International System 
of Units definition of the kilogram that was confirmed last 
May. 

However, despite numerous experiments spanning three 
centuries, the precise value of Gremains uncertain. The 
root of the uncertainty is not fully understood: it could 
be due to undiscovered errors in how the value is being 
measured; or it could indicate the need for new physics. 
One scenario being explored is that G could even vary over 
time, in which case scientists might have to revise their 
view that it has a fixed value. 

If that were to happen — although physicists think it 
unlikely — it would be a good example of non-reproduced 
data being subjected to the scientific process: experimen- 
tal results questioning a long-held theory, or pointing to 
the existence of another theory altogether. 

Questions in biomedicine and in the social sciences do 
not reduce so cleanly to the determination of a fundamen- 
tal constant of nature. Compared with metrology, experi- 
ments to reproduce results in fields such as cancer biology 
are likely toinclude many more sources of variability, which 
are fiendishly hard to control for. 

But metrology reminds us that when researchers 
attempt to reproduce the results of experiments, they do 
so using a set of agreed — and highly precise — experimental 
standards, known inthe measurement field as metrologi- 
cal traceability. Itis this aspect, the authors contend, that 
helps to build trust and confidence in the research process. 

One of the wider lessons from Milton and Possolo’s 
commentary is that researchers from different domains 
must continue to talk and to share their experiences of 
reproducibility. At the same time, we should be careful 
about assuming that there’s something inherently wrong 
when researchers cannot reproduce a result even when 
adhering to the best agreed standards. 

Irreproducibility should not automatically be seen as a 
sign of failure. It can also be an indication that it’s time to 
rethink our assumptions. 
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Out-of-office should 
mean what it says 


Employers must do more to support 
researchers when they take a break. 


etting an out-of-office e-mail reply should 

come witha sense of satisfaction. But in today’s 

research world, an out-of-office message can 

seem little more than creative fiction. Its exist- 

ence and the sender’s absence will not bring work 
toa halt. They don’t prevent an overworked researcher 
from feeling the need to check their inbox while away; nor 
do they stop senders attempting to contact people who 
are on holiday, and expecting a reply. 

Some out-of-office messages do a better job. Last 
October, Stephana Cherak, an epidemiologist at the 
University of Calgary in Canada, received an impressive 
example froma colleague. “I do not respond to e-mails on 
weekends,’ it read. “If this is an emergency, please call my 
mobile. If you do not have my mobile number, then you do 
not have a weekend emergency.” 

Cherak approvingly tweeted the message. Of the more 
than 4,000 re-tweets and replies, many expressed support 
for drawing firm boundaries around time off, or offered 
their own tips. “My life has gotten much better since I 
decided that I don’t need ‘fastest/best/most consistent 
e-mail responder’ to be part of my professional legacy,” 
wrote @popmediaprof. And @runforbooze recommended 
that people politely write “I don’t expect an immediate 
reply” if they have to send a message out of office hours. 

We asked Cherak to reflect on this experience. Ina col- 
umnin Nature's Careers section, she had advice for all those 
trying to balance work with the rest of life (S. Cherak Nature 
578, 179-180; 2020). One recommendation is to ask for 
support from colleagues and supervisors. 

Such support is vital, and employers must recognize 
that their staff need it. Indeed, in France, the ‘right to dis- 
connect’ became law in 2017. Companies with more than 
50 staff members are now obliged to discourage out-of- 
hours and holiday e-mail communication. Where changing 
the law isn’t an option, ateam of organizational psycholo- 
gists at the University of Manchester, UK, has suggested 
setting up a‘bounce-back’, so that e-mails received during 
time off are automatically returned to the sender. 

There are several ways in which employers can support 
their staff when they take breaks, such as helping to put 
work on hold, accepting that projects will take a little 
longer and ensuring that essential tasks can be covered 
when colleagues are away. 

Switching off from work is increasingly difficult — we at 
Nature struggle with this as muchas does any organization. 
An out-of-office message must mean what it says if we are 
to have any hope of turning things around. 
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A personal take on science and society 


World view 


By Nahid 
Bhadelia 


Coronavirus: hospitals must 
learn from past pandemics 


Use methods honed in previous outbreaks to 
prepare for the next one, says Nahid Bhadelia. 


he world that is grappling with 2019 novel 
coronavirus (2019-nCoV) is different from how 
it was during the SARS and HIN1 pandemics. 
The disease itself, and information and disin- 
formation, now travel faster than ever. 

I worked as a clinician in West Africa during the Ebola 
outbreak, and in New York City hospitals during the HIN1 
one. Now, I’m working in Boston, Massachusetts, to prepare 
for potential cases of 2019-nCoV acute respiratory disease. 
And many of the challenges are the same as those facedin 
previous outbreaks. 

The specifics of each virus are important, but so is an 
overarching question: what do you do when large numbers 
of people arrive wanting care for suspected infections of 
anunfamiliar disease? This comes downto three decisions: 
howto quickly identify infected people, how toisolate and 
care for them and howto keep health-care workers safe. 

As this epidemic grows, two trends will make it harder 
to identify people with 2019-nCoV infections while coping 
with those showing similar symptoms in the middle of the 
current influenza season. First, the 2013 and 2016 Ebola 
outbreaks taught us the importance of travel history. But 
with more countries reporting 2019-nCoV cases, it will 
be harder to teach hospital workers what locations to ask 
people about, and hospitals will need to devise strategies 
to keep staff aware of the changing geography of risk. 

Second, as the H1N1 pandemic demonstrated, people 
with no relevant travel history will crowd emergency 
departments and other care settings. Hospitals and 
local public-health authorities will have to encourage 
people who are likely to be infected with 2019-nCoV to 
get diagnosed quickly while discouraging those infected 
with less-threatening diseases from seeking emergency 
treatment. Public-health authorities handle much of this 
education, but hospitals must strengthen communication 
among clinics and their patients. 

Current data suggest that people could transmit the 
new disease before they show symptoms (C. Rothe et 
al. N. Engl. J. Med. http://doi.org/ggjvr8; 2020). Besides 
rapidly identifying travellers, hospitals must strengthen 
infection-control measures that apply to anyone with res- 
piratory symptoms, such as by reinforcing hand hygiene 
and use of masks, frequently decontaminating crowded 
places and finding areas where patients with symptoms 
can be separated from others and cared for. 

Most samples are still being shipped from hospitals 
to reference laboratories. A test closer to the bedside is 
crucial for quickly identifying people with 2019-nCoV 
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and separating them from others with similar symptoms. 
Countries with confirmed cases are sharing viral genetic 
sequences — which makes developing tests easier. 

Many hospitals in richer countries must decide whether 
patients should be cared for in specialized biocontain- 
ment units created for people with Ebola virus disease or 
in rooms assigned to those with other airborne diseases, 
suchas tuberculosis and measles. But the demand for both 
could soon outstrip supply ifthe epidemic spreads, so hos- 
pitals could create a stepwise plan: one for dealing with a 
handful of patients, and another for when large numbers of 
sick patients cause a shortage of intensive-care beds. Hos- 
pitals might need to work with nearby facilities to ensure 
every person needing intensive care receives it. 

Another dilemma hospitals face is deciding what 
personal protective equipment (PPE) health-care workers 
should use to keep themselves from getting infected. The 
Centers for Disease Control and World Health Organization 
advise that workers could prevent contact with body fluids, 
contaminated surfaces and virus particles in the air from 
sneezing and coughing using a range of ensembles: gloves 
and coveralls or gowns, paired with personal air-purifying 
respirators or certified particulate-filtering face masks. 

Contrary to popular belief, the most protective option 
is not always the safest choice. Workers unaccustomed 
to complex PPE are more likely to use it incorrectly and 
thus put themselves at higher risk of infection. During the 
SARS epidemic, workers were at the highest risk of infec- 
tion when putting on and taking off their PPE. Hospitals 
will need to continually train staffin using this equipment, 
and provide frequent re-enforcement. Also, restrictive 
PPE can affect the quality of care that patients receive. 
And uncommon PPE might be harder to get in large sup- 
plies. If supply changes mean that workers have to switch 
equipment mid-epidemic (asl experienced in West Africa), 
confusion soars. In the end, what works for each facility 
varies with resources and setting. 

Hospitals will also have to manage illnesses among 
health-care staff. As more of their workers get sick, hospi- 
tals and clinics will have a harder time responding to the 
outbreak. But if health-care workers come in sick — and our 
experience in New York City during H1N1 showed that up 
to 60% of clinicians did so (N. Bhadelia etal. Infect. Control 
Hosp. Epidemiol. 34, 825-831; 2013) — they could transmit 
the disease to patients and colleagues. Hospitals need staff- 
ing plans to cope with worker shortages. 

Connecting these three sets of decisions is the fact that 
scientific knowledge about a disease changes and (ideally) 
increases as anew epidemic progresses. There is little guid- 
ance on how to craft policies and procedures while living 
through the uncertainty caused by anew virus. When this 
outbreak recedes, that guidance is where we must focus. 
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The world this week 


News in brief 


SUN'S ELUSIVE POLES TO BE 
IMAGED IN DETAIL FOR FIRST TIME 


A European mission that will 
take the closest-ever pictures 
of the Sun and give scientists 
their first clear look at the star’s 
uncharted poles launched on 

9 February. 

Equipped with 
10 instruments, the 
€500-million (US$550-million) 
Solar Orbiter, which took off 
from Cape Canaveral in Florida, 
will journey first to Mercury's 
orbit ona mission that could last 
10 years. 

“Nobody has been able to 
take images this close to the Sun 
before,” says Helen O’Brien at 
Imperial College London, who 
manages the magnetometer 
instrument on the European 
Space Agency (ESA) mission, 
which also involves NASA. “We 
should see some beautiful 
images.” 

The mission’s main aim is to 
investigate interactions between 
the Sun and its heliosphere — the 
bubble of the star’s activity in 
space, says O’Brien. “It’s really 
important to work out how the 
energy propagates from the 
surface out into interplanetary 
space.” 

The spacecraft (pictured 
onthe left in this artist’s 


impression) will be placed 

into an orbit that will bring it, 

at its closest, just 42 million 
kilometres, or 0.28 astronomical 
units, from the Sun (1 Avis the 
distance between Earth and the 
Sun). It will take about two years 
to reach this orbit. 

The Solar Orbiter’s main 
science phase will begin in 
November 2021 and last for 
four years. But ifthe mission is 
extended, as ESA scientists hope 
it will be, the craft would enter 
asecond phase, which would 
allow it to image the Sun’s poles. 
Over several years, mission 
controllers would raise the angle 
of the spacecraft’s orbit above 
the plane of the planets and up 
and over the Sun. 

“That will give us the first-ever 
views of the solar poles,” says 
Daniel Miller, a solar physicist at 
ESA’s European Space Research 
and Technology Centre in 
Noordwijk, the Netherlands, 
whois the project scientist on 
the mission. “We believe that is 
key to better understanding the 
Sun’s magnetic activity cycle.” 

A previous mission, ESA and 
NASA’s Ulysses spacecraft, flew 
over the poles in the 1990s and 
2000s — but it had no cameras. 
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SCIENCE MINISTER'S 
CANCER CLAIMS 
SPARK CONTROVERSY 


Excitement over the creation 

of Colombia’s first Ministry 

of Science, Technology and 
Innovation has given way to 
anger and confusion over the 
appointment of Mabel Torres as 
science minister. The mycologist 
from the Technological 
University of El Chocé in Quibd6 
has made public claims about 
the cancer-fighting properties 
of amushroom extract that she 
makes herself. 

Torres says that she has 
given it to around 40 people 
with cancer — some of whom, 
she says, have entered into 
remission. But the treatment 
was not given under the 
auspices of a clinical trial, the 
methodology was not approved 
by amedical-ethics committee, 
and Torres has not submitted 
the results for publication ina 
peer-reviewed journal. Critics 
want her to resign; one fears 
that her appointment might 
embolden people peddling 
unproven medical treatments. 

Torres defends her actions 
and says she has no plans to 
step down. “I haven't offered 
a drug, let alone marketed it. I 
have rigorously observed the 
established ethical protocols for 
scientific experimentation,” she 
saidinastatement. 

Torres’ supporters, who 
include prominent scientists, 
say she will be an advocate 
for marginalized regions — 
including El Chocé. 
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Two researchers at the South China Agricultural 
University in Guangzhou have suggested that 
pangolins —long-snouted mammals often used in 
traditional Chinese medicine — are the probable animal 
source of the coronavirus outbreak causing global 
alarm. 

Shen Yongyi and Xiao Lihua reported at a press 
conference on 7 February that they had identified the 
pangolin as the potential source of the coronavirus, 
onthe basis of a genetic comparison of coronaviruses 
taken from the animals and from infected humans. 

Scientists have already suggested that the virus 
originally came from bats, because of the similarity 
of its genetic sequence to those of other known 
coronaviruses, but the pathogen was probably 
transmitted to humans by another animal. 

Researchers say the suggestion that pangolins spread 
the coronavirus to people seems plausible — but cau- 
tion that the work is yet to be published. 

The coronavirus has now infected tens of thousands 


of people globally, more than 1,000 of whom have died. 


NASASOARS WHILE 
OTHERS PLUMMET IN 
US BUDGET PROPOSAL 


NASA could see a 12% increase 
to its US$22.6-billion budget 
under a proposal released by 

US President Donald Trump on 
10 February. As science agencies 
go, however, it is an outlier. The 
budget request, which covers 

all areas of government, cuts 
deeply across most research 
spending for the 2021 fiscal year, 
which begins on 1 October 2020. 

The proposal includes 
$38.7 billion for the US National 
Institutes of Health, about a 7% 
cut to current funding levels. 

The US Department of 
Energy’s Office of Science would 
lose nearly 17% from 2020 
levels. It would also eliminate 
the popular Advanced Research 
Projects Agency—Energy, which 
received a record $425 million 
last year, and slash the budget 
of the office of energy efficiency 
and renewable energy by 74%. 

The proposal also seeks 
a $500-million cut for the 
National Science Foundation. 
The agency’s computer 
science and engineering 
section is the only one of its 
directorates that would see 
an increase, consistent with 
the administration’s plans to 
prioritize artificial intelligence 
and quantum computing. 

The Environmental Protection 
Agency’s budget would be 
slashed by roughly 26%, to 
$6.7 billion. 

Although Congress has 
repeatedly rebuffed the 
president’s requests for cuts 
to science — and has increased 
research spending — the budget 
proposal offers a view into the 
administration’s priorities. 

“Trump is being Trump,” says 
Michael Lubell, a physicist at 
the City College of New York 
whotracks US science policy. 
“He can ask for what he wants, 
but it doesn’t mean it’s going to 
happen.” 


SUPER-PRECISE 
CRISPRTOOL 
ENHANCED BY 
ENZYME ENGINEERING 


Researchers have boosted the 
accuracy of a technique based 
onthe popular CRISPR-Cas9 
genome-editing system by 
engineering enzymes that can 
precisely target DNA without 
introducing as many unwanted 
mutations. 

The enzymes, reported on 
10 February, could make a 
method called base editing 
more feasible as a tool to treat 
genetic diseases (J.L. Doman 
et al. Nature Biotechnol. http:// 
doi.org/dmegf; 2020). 

Base editing uses the 
Cas9 enzyme to target DNA edits 
to aspecific site, where other 
enzymes chemically convert 
one DNA base into another. 
This offers greater control than 
conventional CRISPR-Cas9 
editing, but can still introduce 
‘off-target’ changes at random 
locations inthe genome. 

Ateam led by David Liu, a 
chemical biologist at the Broad 
Institute of MIT and Harvard 
in Cambridge, Massachusetts, 
developed screening methods 
that can detect unwanted 
mutations without the need for 
costly full-genome sequencing. 
This allowed the team to identify 
new base-editing enzymes that 
can change the DNA base C 
to T without making as many 
off-target edits. The approach 
could allow researchers to 
develop safer gene therapies. 
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The world this week 


News in focus 


Astudy on the social spider Stegodyphus dumicola was the first to be retracted. 


‘AVALANCHE’ OF RETRACTIONS 
SHAKES BEHAVIOURAL- 
ECOLOGY COMMUNITY 


Allegations of fabricated data in papers on spider behaviour have 
prompted a university investigation and some soul-searching. 


By Giuliana Viglione 


complex web is unravelling in the 

field of spider research. On 5 Febru- 

ary, McMaster University in Hamilton, 
Canada, confirmed that it was inves- 
tigating allegations that behavioural 
ecologist Jonathan Pruitt had fabricated data 
in at least 17 papers that he had co-authored. 
Since concerns about his work became 
public in late January, scientists have rushed 
to uncover the extent of questionable data 
in Pruitt’s studies. Publishers are now trying 
to keep up with requests for retractions and 
investigations. So far, seven papers have 


been retracted or are in the process of being 
retracted; five further retractions have been 
requested by Pruitt’s co-authors; and research- 
ers have flagged at least five more studies as 
containing possible data anomalies. 


Atangled web 
Pruitt, who is reportedly doing field research 
in Australia and the South Pacific, told Science 
last week that he had not fabricated or manip- 
ulated data in any way. He did not respond to 
multiple requests from Nature for comment 
on the mounting list of retractions, or the 
accusation that he had fabricated data. 

His research looks at how different 
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personalities form in communities of social 
spider species that live in groups, and it has 
implications for emerging ideas on how 
animal behaviours evolve in the context of 
their environment. 

The retractions started in mid-January, 
when authors of a paper in The American 
Naturalist’ pulled it, citing “irregularities in 
the raw data”. These were data that Pruitt had 
provided, showing how long it takes social 
spiders to resume typical behaviours after a 
disturbance, such as a simulated attack from 
a predator. 

After asecond retraction’, Kate Laskowski, 
a behavioural ecologist at the University of 
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News in focus 


California, Davis, who had co-authored both 
studies with Pruitt, wrote a blogpost about 
those irregularities (see go.nature.com/ 
39m535t). She had found multiple stretches 
of data that had been copied and pasted to 
represent findings for multiple spiders. When 
Pruitt’s explanations failed to account for the 
anomalies, she requested that the journals 
retract the papers, reportedly with Pruitt’s 
consent. 

“Then, hell broke loose,” says Niels 
Dingemanse, a behavioural ecologist at Ludwig 
Maximilian University in Munich, Germany, 
who has helped to uncover the data issues. 

More than 20 scientists — co-authors, peers 
and other interested observers in the field — 
mobilized to pore through the data in almost 
150 papers on which Pruitt is a co-author, 
looking for evidence of manipulated or fab- 
ricated numbers. They found similar signs of 
copy-and-paste duplications. In at least one 
instance, researchers identified formulae that 
had been inserted into a published Excel file, 
designed to add or subtract from a pasted 
value and create new data points. 

Several have stated that they consider this 
clear evidence of fraud. Dingemanse says that 
his mind was made up by the “avalanche of 
retractions” in progress, as well as the mount- 
ing piles of irregular data. “It is hard to believe 
these data are not fabricated,” he says. 

The 17 papers that include questionable 
data have been cited more than 900 times, and 
it will take scientists a while to sort out which 
ideas have been supported elsewhere in the 
literature and which will need to be retested. 
“My guess is the impact will probably be pretty 
big,” Laskowski says. 


Pruitt had written “a lot of really impressive 
papers” and was regarded by many asa “rising 
star”, says Maria Rebolleda-Go6mez, a micro- 
bial ecologist at Yale University in New Haven, 
Connecticut. 

A spokesperson for McMaster University 
confirmed that the institution was investigat- 
ing, but would provide no further comment 
on issues of research integrity. The Univer- 
sity of California, Santa Barbara, where Pruitt 
did most of the work in question, declined to 
comment on the specific case but said that it 
“would cooperate with any other institution 
conducting an investigation”. 


“My guess is that 
the impact will 
probably be 
pretty big.” 


Laskowski says that although the wave 
of retractions deals a blow to behavioural 
ecology, she is heartened by how quickly the 
community has acted to set the scientific 
record straight. Researchers have lessons to 
learn about making data publicly available 
— by one estimate, more than 60% of Pruitt’s 
data-containing papers are injournals with no 
data-sharing requirements — and about check- 
ing data that they receive from colleagues. 
But she and others are optimistic that these 
lessons will ultimately strengthen the field. 


1. Laskowski, K. L., Montiglio, P.-O. & Pruitt, J. N. Am. Nat. 
187, 776-785 (2016); retraction 195, 393 (2020). 

2. Laskowski, K. L. & Pruitt J. N. Proc. R. Soc. B 281, 20133166 
(2014); retraction 287, 20200077 (2020). 


JOURNAL BANS HIGHLY 
CITED RESEARCHER 
FOR CITATION ABUSE 


Probe finds that Kuo-Chen Chou repeatedly 
suggested dozens of citations be added to papers. 


By Richard Van Noorden 


US-based biophysicist who is one 

of the world’s most highly cited 

researchers has been removed from 

the editorial board of one journal 

and barred as a reviewer for another, 

after repeatedly manipulating the peer-review 
process to amass citations to his own work. 

On 29 January, three editors at the Journal 

of Theoretical Biology (/TB) announced in an 

editorial that the journal had investigated and 
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barred an unnamed editor from the board for 
“scientific misconduct of the highest order” 
(M. Chaplain etal.J. Theoret. Biol. 488, 110171; 
2020). 

The journal's publisher, Elsevier, confirmed 
to Nature that the barred editor is Kuo-Chen 
Chou, who founded andruns an organization 
that he calls the Gordon Life Science Institute, 
in Boston, Massachusetts. According to the 
editorial, Chou asked authors of dozens of 
papers he was editing to cite a long list of his 
publications — sometimes more than 50 — and 
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suggested that they change the titles of their 
papers to mention an algorithm he had 
developed. 

“The magnitude of his self-citation requests 
are shocking,” says Jonathan Wren, an associ- 
ate editor for Bioinformatics, ajournal that last 
year barred Chou from reviewing its papers, 
although it did not name him at the time. 
“But what blows my mind is that suspicious 
citation patterns to him go back decades and 
authors comply with an apparently amazing 
frequency.” 

Chou retired from a career in the pharma- 
ceutical industry in 2003. He then founded the 
Gordon Life Science Institute, which he calls 
an institute with “no physical boundaries’, of 
which anyone can become a member. Before 
2003, Chou had published 168 papers — mostly 
inthe field of computational biology — which 
were cited around 2,000 times. But he now has 
602 papers with more than 58,000 citations, 
according to Elsevier’s Scopus citations data- 
base. He is one of the world’s most highly cited 
researchers. 

The/TB editorial says that Chou also han- 
dled papers written by close colleagues at his 
own institute — some of whom the journal later 
couldn’t trace, which the editorial says calls 
into question their veracity. It adds that Chou 
sometimes reviewed papers under a pseudo- 
nym, or chose reviewers from his institution. 
And in many cases, Chou was added to papers 
as aco-author during the final stage of review. 

“Regrettably, this process was repeated for 
dozens of papers,” the editorial says. It adds 
that the journal wants to “apologize for miss- 
ing this blatant misuse of the editorial system’. 

Chou told Nature that mentions of his algo- 
rithms in papers were “not from ‘reviewer 
coercion’, but from their very high efficacy 
and widely recognized by many users”. But he 
declined to answer questions about the cita- 
tion practices for which he was banned, and 
instead referred Nature to his website. 

Wren flagged the suspicious citation pat- 
terns to the /TB after an investigation at his 
own journal. That probe revealed that in 
every review, Chou had requested that man- 
uscript authors add citations — an average of 
35 of them, 90% to papers he had co-authored. 
Bioinformatics announced that it had barred 
areferee in January 2019. 

Wren, a bioinformatician at the Oklahoma 
Medical Research Foundation in Oklahoma 
City, says investigations into Chou’s citations 
are under way at at least three other journals 
to which he has pointed out suspicious pat- 
terns. Wren is currently writing an algorithm 
to flag unusual citation patterns in papers 
automatically. 

The case comes amid efforts by Elsevier 
to crack down on the practice of ‘coercive 
citation’. Last year, the Amsterdam-based 
publisher said it was investigating hun- 
dreds of researchers whom it suspected of 
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manipulating peer review to boost their 
citations. Chou’s case is the first to be 
revealed since that announcement. “While 
thankfully rare, such practices are an abuse 
of the peer-review system and undermine the 
hard work and commitment that editors and 
reviewers devote to ensuring the integrity of 
the scholarly record,” a spokesperson says. 
“Elsevier has developed analytical tools to 
help detect such practices and is committed 
to implementing technology to flag citation 
manipulation before publication.” 


From 2014 to 2018, Chou was named as a 
highly cited researcher ina list produced by 
Clarivate Analytics, an information-services 
firm in Philadelphia, Pennsylvania, that owns 
the citation database Web of Science. But his 
name does not appear on the 2019 list; last 
year, Clarivate decided to remove scientists 
whose papers showed “unusually high levels 
of self-citation’. 

Elsevier hasn't yet decided what to do about 
papers that Chou handled that liberally cite his 
work, the spokesperson says. 


THE PROTEIN-IMAGING 
TECHNIQUE TAKING OVER 
STRUCTURAL BIOLOGY 


The number of structures being determined by 
cryo-electron microscopy is growing explosively. 


By Ewen Callaway 


revolutionary technique for 

determining the 3D shape of 

proteins is booming. Last week, a 

database that collects protein and 

other molecular structures obtained 
using cryo-electron microscopy, or cryo-EM, 
acquired its 10,000th entry. 

Submissions to the Electron Microscopy 
Data Bank (EMDB) — a popular repository for 
structures solved using electron microscopy 
— have increased exponentially in recent years, 
largely because of the explosive growth inthe 
number of cryo-electron microscopes in lab- 
oratories worldwide (see ‘Structure sleuths’). 
The EMDB curates structures solved with other 
microscopy methods, but the vast majority 
use cryo-EM. 

The technique involves flash-freezing 
solutions of proteins or other biomolecules, 
and then bombarding them with electrons 
to produce microscope images of individual 
molecules. These are used to reconstruct the 
3D shape, or structure, of the molecule. Such 
structures are useful for uncovering how pro- 
teins work, how they malfunction in disease 
and how to target them with drugs. 

For decades, structural biologists preferred 
to use X-ray crystallography, a technique that 
involves crystallizing proteins, pummelling 
them with X-rays and reconstructing their 
shape from the resulting tell-tale patterns of 
diffracted light. X-ray crystallography pro- 
duces high-quality structures, but it’s not easy 
to use with all proteins — some can take months 
or years to crystallize, and others never crys- 
tallize at all. Cryo-EM doesn’t require protein 


crystals, but the technique languished because 
it tended to produce low-resolution structures 
—some scientists called it blobology. 
Breakthroughs in hardware and software 
in 2012-13 produced more sensitive electron 


STRUCTURE SLEUTHS 


Most structures of proteins and other biological 
molecules are still solved with X-ray crystallography. 
But a revolutionary technique called cryo-electron 
microscopy (cryo-EM) is catching up. 
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Fine detail 
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microscopes and more sophisticated soft- 
ware for transforming the images they cap- 
tured into sharper molecular structures. 
That paved the way for the current growth 
of cryo-EM, says Sjors Scheres, a structural 
biologist and specialist inthe technique at the 
MRC Laboratory of Molecular Biology (LMB) 
in Cambridge, UK. 

Richard Henderson, an LMB structural 
biologist who shared the 2017 Nobel Prize in 
Chemistry for his work developing the tech- 
nique, says that even after these advances, 
growth was slowat first, because only a small 
number of labs had access to the equipment. 
But when they started using cryo-EM to pro- 
duce detailed maps of molecules such as the 
ribosome — cells’ protein-making machines 
— other scientists, as well as their institutions 
and funders, quickly took notice. “All the 
people who had invested in other things and 
made the wrong decisions, ittook them a year 
to catch up,” says Henderson. 

He estimates that, by 2024, more protein 
structures will be determined by cryo-EM 
than by X-ray crystallography. Cryo-EM has 
already supplanted X-ray crystallography for 
one category of proteins that scientists are 
especially interested in — those embedded in 
cell membranes. Many such membrane-bound 
proteins are implicated in disease and serve as 
targets for drugs. 


Advanced imaging 


The structures of molecules determined by 
cryo-EMarealso getting more detailed, thanks 
to continuing improvements in hardware and 
software, says Scheres. 

Initially, the sharpest cryo-EM structures 
were of highly stable proteins that were used 
totest the limits of the technology. But Scheres 
has noticed that researchers are increasingly 
obtaining very high-resolution structures 
of medically important molecules, such as 
cell-membrane proteins, even though they 
tend to flop around. 

“We're now coming to the point where the 
easy samples have been done and people are 
looking at more complex problems,” says 
Ardan Patwardhan, a structural biologist at 
the European Molecular Biology Laboratory- 
European Bioinformatics Institute in Hinxton, 
UK, who leads the team that runs the EMDB. 

Henderson expects the boom in cryo-EM 
structures to slow at some point. One factor 
that could sap growth, he says, is the high cost 
of the most powerful microscopes, which can 
exceed £5 million (US$7 million). They also 
cost thousands of pounds each day to run, 
and require specialized labs that minimize 
vibrations. Henderson is campaigning to 
convince firms to develop cheaper, but still 
useful, microscopes that could spread the 
technique even further. “At the moment, you 
cannot go wrong by putting more investment 
into cryo-EM,” he says. 
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News in focus 
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A soldier patrols La Pampa, an area in the Peruvian Amazon that was once lush rainforest. 


CAN ARAINFOREST 
DESTROYED BY GOLD- 
MINERS BOUNCE BACK? 


Illegal miners have left La Pampa, giving researchers 
access to an inadvertent experiment in restoration. 


By Jeff Tollefson 
in La Pampa, Peru 
aa oly shit!” Miles Silman gasped as 
his motorized rickshaw rattled 
out of the forest and ontoa des- 
olate beach. All traces of the 
trees, vines and swamps that 
once covered this patch of the Amazon rainfor- 
est had vanished. In their place were sun-baked 
dunes and polluted ponds created by illegal 
gold-mining. Silman, a conservation biologist 
at Wake Forest University in Winston-Salem, 
North Carolina, was there to document the 
carnage. 

La Pampa was once the largest and most 
dangerous gold-mining zone in the Peruvian 
Amazon. It was so riddled with gangsters that 
scientists dared not enter, and, for nearly a 
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decade, could only watch by satellite as gold 
hunters mowed down some of the most bio- 
diverse rainforest on the planet. That ended 
in February 2019, when the government 
declared martial law and expelled an estimated 
5,000 miners. 

Now, La Pampa is deserted and under 
military guard. When Silmanand his colleagues 
surveyed the area for the first time in lateJune, 
they found a barren, eerily quiet landscape 
polluted with mercury, a toxic by-product of 
mining. The data that the researchers collect 
during this inadvertent experiment could help 
to determine the extent to which restoration 
is possible — or document the evolution of an 
entirely new, and human-made, ecosystem. 

Silman and his colleagues at the Center for 
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Amazonian Science and Innovation (CINCIA), 
a non-profit research institute in Puerto 
Maldonado, Peru, have spent the past several 
months mapping the area with drones and 
surveying the remaining plants and animals. 
The team has been studying dozens of tree 
species to see which can survive among the 
dunes and along the shores of ponds. 

CINCIA scientists have also tested the air, 
water and soil for mercury contamination. 
Another team, from Duke University in Dur- 
ham, North Carolina, has collected data there 
to help unravel how mercury — whichcan harm 
children’s brain development — moves from 
polluted water or soil and up the food chain. 

The research by the CINCIA team and other 
scientists will feed into the Peruvian govern- 
ment’s ongoing efforts to rehabilitate the area, 
says Camila Alva, director of pollution con- 
troland chemical substances at the country’s 
environment ministry. 

The government has already begun a pilot 
project to restore the Tambopata National 
Reserve, a protected forest that miners 
invaded when La Pampa expanded. Peruvian 
President Martin Vizcarra visited the reserve 
on 5 December ina show of support. Results 
from that work could help to guide the gov- 
ernment’s longer-term efforts to reforest, and 
perhaps even resettle, parts of La Pampa. 

LaPampastarted out as aroadside outpost 
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on the Interoceanic Highway, one of the first 
stops for would-be miners who flowed from 
the Andes into the Madre de Dios region of the 
Peruvian Amazon when gold prices spiked a 
decade ago. The regional capital, Puerto 
Maldonado, grew into a mining hub. And 
La Pampa became a bustling town of some 
25,000 people, with a reputation for prosti- 
tution, modern slavery and organized crime. 

The boom proved too lucrative to control. 
The gold dust is almost everywhere, and with 
a few rudimentary devices — including a pet- 
rol-powered water pump and a hand-made 
sluice — anyone can collect silt. Then it’s just 
a matter of mixing in mercury, which binds 
the gold, to recover as muchas 10-15 grams of 
gold per day. That is several hundred dollars’ 
worth onthe global market. 

Miners don’t worry about mercury’s inevita- 
ble release into the environment or the health 
effects of exposure, says Luis Fernandez, 
CINCIA’s executive director. 

But researchers want to understand how 
much mercury miners left behind, and how 
it is moving through the ecosystem. Ona 
sunny June day in La Pampa, the CINCIA team 
explored the site in preparation for research 
that will look for mercury contamination in 
the air, water and soil, as well as birds, fish and 
other aquatic life. 

“We have birds and insects, that’s some- 
thing that we can sample,” said environmen- 
tal chemist Claudia Vega, who coordinates 
CINCIA’s mercury research programme. The 
tests will help to determine how much mercury 
could be moving into the food chain, where it 
would pose a danger to people, including the 
farmers who have laid down claims for land 
in La Pampa. 

So far, testing by CINCIA and other research- 
ers suggests that the mercury contamination 
is concentrated in the ponds. That means the 
land is probably safe for farming, but that 
eating fish that live in the ponds could be 
dangerous. “We cannot put people out there 
— families and children —ifwe don’t know what 
the riskis,” says Martin Arana, a forest engineer 
whois advising the Peruvian forest service. 


Drone maps dunes 


The team is also measuring the extent of 
deforestation inLa Pampa and the potential to 
reforest the area. During their trip inJune, the 
researchers launched their first drone flight 
ona dune deep inside the mining zone. The 
drone flew north-south transects at a height 
of around 200 metres to produce detailed 
3D maps of the area’s topography, including 
its dunes and ponds. 

Silman’s team is using those maps to esti- 
mate how much carbon was released into 
the atmosphere when the forest was mowed 
down to make way for mining activities. That 
information can also be used to track forest 
recovery and guide future plant surveys. 
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Illegal gold-mining has transformed forested land into sand and ponds. 


Government officials are assessing the cost 
and technical feasibility of a major reforesta- 
tion effort — as well as the jobs that it might 
produce. Working with CINCIA, Peru’s park 
service and environment ministry have already 
launched their pilot reforestation project 
on 30 hectares of the Tambopata National 
Reserve. The agencies are planning to repli- 
cate that work across more than 750 hectares 
in the reserve. 

The forest service is also studying how to 
design, implement and pay for an even larger 
reforestation project in La Pampa that could 
begin in a few years. But Arana says that the 
government will have to remain vigilant to 


“It’s going to bea hell of 
alot better thana barren 
landscape with some toxic 
puddlesin the middle.” 


the threat of illegal mining. “What happens 
if the price of gold is very, very high?” he 
asks. “Maybe the illegal miners come back to 
La Pampa, and there will be conflict with the 
people who are working in reforestation.” 
Silman is interested in understanding how 
different types of vegetation will recolonize 
the landscape naturally — and whether peo- 
ple might be able to guide and accelerate the 
process of reforestation. Strangler fig trees, 
which typically start life high in tree tops where 
light is abundant and then strangle their hosts, 
are already sprouting up inLa Pampa’s dunes, 
alongside burrowing owls (Athene cunicularia) 
that typically nest in arid shrublands. 
Silman’s team has been growing test plots 
of more than 75 plant species to guide the 
reforestation push. The scientists are track- 
ing how the plants perform in a variety of 
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conditions: some prefer flat terrain with direct 
sun, whereas others need shade or very moist 
soil. The team’s results suggest that adding 
charcoal — or a similar substance called 
biochar — to the soil bolsters plant growth 
and survival (D. Lefebvre et al. Forests 10, 678; 
2019). “We want to give people options, so that 
we aren't just planting trees that are going to 
die,” Silman says. 

Miners stripped the region’s soils of all of 
their nutrients and fundamentally changed the 
way water moves through the landscape. That 
will make restoration attempts in La Pampa 
more difficult thanin other areas where people 
have rebuilt or restored ecosystems harmed 
by mining, says Stuart Pimm, an ecologist at 
Duke University. 

But rather than worrying too much about 
trying to recreate what was there before, Pimm 
says that scientists and the government should 
get some plants in the ground and let nature 
take its course. “Just getting some forest cover 
is something they can probably do,” he says, 
“and it’s going to be a hell of alot better than 
a barren landscape with some toxic puddles 
inthe middle.” 

As Silman and his colleagues wrapped up 
their day of field work inJune, the Sun was set- 
ting — and La Pampa was coming alive. Ducks 
were on the move, and fish in ponds began 
rising to feed on insects. 

Silman has little doubt that plants and ani- 
mals will recolonize this largely empty space 
over hundreds or thousands of years. The 
question, he says, is whether scientists can 
help to accelerate that recovery, or whether 
La Pampa will remain little more than a mon- 
ument to human stupidity over the coming 
decades. 

“That land has already been deforested,” 
he says. “There’s a lot of incentive for us to be 
clever and to try to do good things there.” 
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Around the globe, drained peatlands are emitting billions of tonnes of carbon 
dioxide each year. To keep climate change in check, governments and researchers 
are working to keep peatlands healthy. By Virginia Gewin 
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HANNAH IMLACH 


na chilly September morning in 
Scotland’s northern highlands, 
a giant excavator rumbles back 
and forth across peatlands that 
stretch to the horizon. As the wind 
whips across the mossy terrain, the 
machine’s operator is undoing dec- 
ades of damage by smoothing out 
the drainage ditches that scar the landscape. 

The peat here can reach up to 10 metres 
deep and developed slowly over thousands of 
years. Then, inthe middle of the twentieth cen- 
tury, Scotland embarked on anill-fated effort 
to transform the bogs into tree farms. Land- 
owners ploughed trenches to drain bogs and 
planted pine trees and spruce that often failed 
to thrive. As the ventures struggled, research- 
ers and the Scottish government started to see 
the peatlands ina fresh light, recognizing that 


A flux tower in Scotland's Flow Country 
measures gas concentrations and other 
variables in a peatland. 


they lock up vast amounts of carbon. If they are 
not kept healthy, the bogs could release their 
stored carbon and accelerate global warming. 

That’s why a team of researchers and land 
managers is digging up trees and flattening 
furrows in former plantations southwest 
of Thurso. The effort is part of a roughly 
£50-million (US$65-millon) investment that 
the Scottish government and other organ- 
izations have made towards restoring the 
country’s blanket bogs — undulating carpets 
of spongy hummocks built from Sphagnum 
mosses. The largest area of blanket bogs in 
the world is inthe Flow Country — a low-lying 
expanse between sheer cliffs to the north and 
glacially carved mountains to the southwest. 

Remote and exposed, these peatlands are 
named after the Norse term floi, which means 
boggy ground. They have long been described 
as worthless wastelands. “Local people called 
the peatlands mamba — miles and miles of 
bugger all,” says Roxane Andersen, a bioge- 
ochemist at the University of the Highlands 
and Islands’ Environmental Research Institute 
in Thurso. 

More than 80% of the 1.7 million hectares 
of peatland in Scotland have been cut for fuel 
or otherwise degraded, and roughly 500,000 
hectares have been drained and forested with 
non-native conifers. “The reality, though, isthe 
trees did poorly,” says Andersen. 

Despite that, the peatlands have tremen- 
dous value for carbon storage. These areas 
hold more than one-quarter of all soil carbon, 
even though they account for only 3% of Earth’s 
land area‘. Globally, peatlands hold more than 
twice as much carbonas the world’s forests do, 
according to the United Nations Environment 
Programme. 

But in many places, humans have turned 
vast expanses of these environments from 
long-term carbon sinks into carbon sources. 
Damaged or drained peatlands worldwide 
emit at least 2 billion tonnes of carbon diox- 
ide annually — roughly 5% of anthropogenic 
greenhouse-gas emissions — largely through 
peat fires and oxidation of the buried carbon. 
And emissions from bogs are expected to rise 
sharply. 

As the threat of climate change has grown 
more severe, researchers and governments 
have identified peatlands as ideal targets for 
stopping emissions, and even sopping up car- 
bon. Although Canada, Russia and Indonesia 
contain the largest tracts of peatland in the 
world, Scotland has emerged as a leader in 
the effort to restore the habitat, which cov- 
ers more than 20% of the country (see ‘For 
peat’s sake’). Scotland will probably meet, if 
not exceed, its 2020 goal of restoring 50,000 
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hectares, mainly on government-owned 
nature reserves and forestry land. And it aims 
to push that total to 250,000 hectares by 2030. 

Restoring peatlands to health is one of the 
key ways in which Scotland, which last April 
became the first country to declare a climate 
emergency, intends to reach net-zero green- 
house-gas emissions by 2045. “Scotland has 
raced out in front by making good connec- 
tions with researchers and government,’ says 
Jack Rieley, a tropical-peatland ecologist and 
executive board member of the International 
Peatland Society, which is based in Jyvaskyla, 
Finland. Researchers from around the world 
have flocked to Scotland to glean insights into 
how to develop a successful national strategy 
for restoring peatland. 

The biggest question is whether restora- 
tion will simply stop carbon emissions from 
peatlands or revive the bogs to the point that 
they can store more carbon. Other countries, 
notably Indonesia, are also pursuing efforts 
to reduce carbon losses from their peatlands. 
To make sure that these projects are working, 
researchers are developing satellite tech- 
niques and other tools to monitor the health 
of these landscapes. 

But there is no guarantee that the efforts will 
pay off. “It’s so easy to break an ecosystem, and 
it’s so hard to bring it back,” says Andersen. “We 
can’t recreate something from the past, but we 
can do our best to make it resilient.” 


Tough going 

Just over 100 kilometres southwest of Thurso, 
the boggy soil is so sodden in spots that I sink 
up to my knees and nearly lose a boot. But the 
muck hasn’t stopped two excavators — each 
more than 13 tonnes — that are fitted with 


“It’s so easy to break an 
ecosystem, and it’sso 
hard tobring it back.” 


extra-wide tracks to distribute their weight. 
As part of an effort to convert the region back 
to bogs, they trundle across the peat, cutting 
and stacking stands of trees that have been 
there for 30 years. 

The timber is low quality, pockmarked by 
hungry pests and prone to being blown down, 
a hallmark of trees that are growing in acidic 
peat. Neil McInnes and Tim Cockerill oversee 
this and other restoration projects undertaken 
by Forestry and Land Scotland, agovernment 
land-management agency based in Inverness. 
The harvest costs more than the timber is 
worth, and because the trees will be either 
incinerated on site to generate electricity or 
made into heating pellets, the carbon in the 
trees will return to the atmosphere. 

Removing the trees was a bitter pill at first. 
Many foresters felt they were being unfairly 
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criticized for having planted them in the first 
place — even though it had beena government 
directive at the time. But McInnes says that 
attitudes have changed over the past few 
years as people have grown to understand the 
carbon-storage potential of peatlands, andthe 
Scottish government has madeita priority to 
reduce emissions. “It doesn’t feel like a fight 
any more,” he says. 

Early peatland-restoration efforts beganin 
Flow Country in 1995, focused more on restor- 
ing bird habitats. “Carbon was barely on the 
agenda at that time,” says Norrie Russell, for- 
mer manager of the Forsinard Flows reserve, 
which is owned by the Royal Society for the 
Protection of Birds and is where Andersen 
conducts her research. 

The agenda gained momentum in 2010, 
when the International Union for Conserva- 
tion of Nature launched the UK Commission 
of Inquiry on Peatlands to assess the state of 
these ecosystems. That effort — along with 
widespread support for tackling climate 
change — triggered more interest in nursing 
peatlands back to health. Now, Russell says, 
the political push for peatland restoration is 
focused mainly on keeping carbon locked up. 
Ina2017 public survey (see go.nature.com/2s- 
fvbiy), the vast majority of respondents sup- 
ported restoration to mitigate climate change, 
to improve water quality and wildlife habitat 
and to protect this important aspect of Scot- 
land’s identity. 


Towers of resilience 


Andersen is working with McInnes and Cock- 
erill, as well as various organizations, to 
determine how best to manage the land for 
carbon storage. To gather evidence, she and 
her colleagues have installed four towers in 
Flow Country since 2008 to monitor the flow 
of gases and temperature, among other vari- 
ables. Sensors near the towers measure heat 
flux, water level, soil temperature and pre- 
cipitation. Building on existing data, Ander- 
sen won a £986,088 award last year from the 
London-based charity the Leverhulme Trust 
to determine howto make peatland resilient. 

In the data collected so far, Andersen and 
her colleagues have detected some promising 
changes”. They found that the first patches of 
restored peatlands, in which trees were sim- 
ply cut and rolled into the blocked drainage 
ditches, switched froma carbon source toa 
carbon sink after 16 years. Although that work 
demonstrated that transitioning forest back to 
bog can bean effective way to restore a carbon 
sink, the researchers found that they could 
get faster results with more intensive man- 
agement — suchas clearing the carbon-rich 
trees and branches and flattening the ground. 
Although these more intensive strategies can 
trigger an initial pulse of greenhouse-gas emis- 
sions by disturbing the soil, once it is more uni- 
formly wet this can also accelerate the switch 
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Peatlands hold more than 
one-quarter of the planet's 
soil carbon, yet cover just 
3% of the land. These 
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from carbon source to sink — bringing it down 
toas little as ten years, says Andersen. 

These results mirror research in Canada that 
found it takes one to two decades for peatlands 
to recover following restoration efforts*. The 
trick to restoring the natural hydrology, the 
way water moves through the system and is 
stored by the peat, is choosing locations that 
aren't too degraded and where there is still 
enough residual peat and plant vegetation, 
says Nigel Roulet, a peatland scientist at McGill 
University in Montreal, Canada. “If younudge 
systems along, and pamper them through first 
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years of recovery, they take off on their own,” 
says Roulet, “and carbon dynamics returntoa 
natural system within a decade or two.” 

But that’s a complicated story to convey 
— especially amid a groundswell of support 
around the globe for efforts to plant trees to 
combat global warming. Last year, a study sug- 
gested that Earth’s ecosystems could support 
1 billion more hectares of forest — and store 
25% of the atmospheric carbon pool". Politi- 
cians in many countries, including the United 
Kingdom, have been eagerly promoting 
efforts to plant more trees. Scotland planted 
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11,200 hectares of new woodlands in 2018. And 
inthe run-up to last December’s UK general 
election, both the Labour and Conservative 
parties promised to plant millions more trees 
each year. These new arboreal ambitions could 
make it harder for researchers and officials 
to argue that peatlands are the wrong places 
for trees. “Unless landowners and managers 
all work together on an agreed strategy then 
there will be pressure,” says McInnes. “We’ve 
seen this before.” 


Breathing bogs 

The key question about restoration efforts 
across the globe is how well they can slow 
greenhouse-gas emissions from bogs. To 
answer that, researchers need cheaper and 
faster tools for assessing the health of peat- 
land over wide areas. Andersen has partnered 
with geoscientist David Large at the University 
of Nottingham, UK, to develop a method for 


monitoring ‘bog breathing’ through satellite 
measurements — specifically, interferomet- 
ric synthetic aperture radar (InSAR). Because 
peatlands that are functioning well rise and 
fall with the level of the water table, the carbon 
emissions can be inferred from how the peat 
behaves, says Large. 

The team tested this method on 22 sites 
around the Flow Country over 18 months and 
found that wet, mossy peat in good condition 
— the least likely to bea carbon source — rises 
in mid-winter and falls in mid-summer’. Drier, 
shrubby peat, which is more likely to emit 
carbon, rises in late spring and falls in late 
summer. As a next step, the researchers plan 
to correlate their InSAR results with measure- 
ments of carbon emissions. 

InSAR will offer funders and government 
officials a means of quantifying success, says 
Large. “At what point is peat restored? We’ve 
spent millions and haven't really thought 


Scotland's Flow Country is the world’s largest area of blanket bogs. 
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through what success will look like,” he says, 
at least in terms of metrics. Large is now testing 
the tool in tropical peatlands, which he says are 
challenging because in areas suchas southeast 
Asia, peat builds only under forest cover, and 
the trees cause trouble for InSAR. If the meth- 
odology can be validated across peatland types 
and conditions, it could help governments to 
chose which areas to restore and to monitor 
how effective interventions have been, says 
Susan Page at the University of Leicester, UK, 
who studies peatlands in southeast Asia. 

Other teams are developing different meth- 
ods for monitoring peatland emissions. Inthe 
tropics, for example, researchers are tracking 
deforestation, which often precedes efforts to 
drain the peatlands. Every country will have to 
develop its own monitoring system, says Hans 
Joosten, a peatland ecologist at the University 
of Greifswald in Germany. 

Monitoring is urgently needed in many 
regions, including Indonesia. The country is 
plagued by seasonal fires that spread over 
dry peatlands and send billows of smoke 
across much of the country. The fire risk has 
increased in the past few decades because 
dams were installed to drain the country’s 
peat and growcrops — notably oil palm trees, 
which do best when the water table is roughly 
80 centimetres below the surface. Following 
devastating peat fires in 2015, Indonesia set an 
ambitious goal to restore 2 million hectares, 
about 10% of the roughly 20 million hectares of 
the country’s original peat swamp forests, by 
2020 to prevent fires and improve air quality. 

By the end of last year, the campaign has 
re-wetted about 788,000 hectares, which 
involves raising the water table to within 
40 centimetres of the surface. Nazir Foead, 
head of Indonesia’s Peatland Restoration 
Agency says investigations in the country 
found that “when the table fell below 40 centi- 
metres, the fire incidences soar significantly”. 
Indonesia plans to achieve more than half of 
its carbon-reduction goals to support the Paris 
climate agreement through re-wetting and 
protecting peatlands. 

In theory, these plans should reduce Indo- 
nesia’s emissions, but they probably won't 
restore peat’s ability to store new carbon, 
according to several researchers. “Re-wetting 
is the initial stage towards peatland restora- 
tion but it is not the magic bullet,” says Rieley. 
Unlike in Scotland where mosses build up peat, 
trees are needed to deposit peat layers in trop- 
ical systems. In Indonesia, “where is the peat 
going to come from?” asks Rieley. Foead says 
his agency can’t yet quantify how many trees 
have actually been replanted. 

Even if Indonesia doesn’t turn its peatlands 
back into a carbon sink, Joosten argues that 
re-wetting to 40 centimetres below the peat 
surface will reap big rewards from a climate 
perspective. Doing so would cut emissions 
from re-wetted areas by 50% because it halves 


Nature | Vol 578 | 13 February 2020 | 207 


Feature 


the amount of peat exposed to oxidizing con- 
ditions. And it would reduce global emissions 
much more than Scotland’s endeavours, says 
Joosten, who was part of aninternational team 
that, in 2018, won the Indonesia Peat Prize, 
which is awarded by the government and the 
David and Lucile Packard Foundation, based 
in Los Altos, California. The team devised a 
method to map the extent and depth of peat. 


Rare efforts 


A fundamental problem is that large-scale peat- 
land restorationis happening in just a fewloca- 
tions, say researchers. In fact, the global total 
peatland area is decreasing because bogs con- 
tinue to be drained in the tropics and the land 
is converted for other uses. If that continues, 
carbon released from peatlands will help to 
send the global temperature shooting past the 
target of 1.5-2 °C warming above pre-industrial 
levels set by the Paris agreement. 

One complication in the effort to re-wet 
peatlands is that restored wetlands will pro- 
duce some amount of methane, which is a 
potent greenhouse gas. But Joosten says that 
this will be more than balanced by the reduc- 
tionin emissions of carbon dioxide and nitrous 
oxide. Overall, re-wetting has a net benefit for 
the climate. Rather than aiming to turn global 
peatlands into sinks, he says, a more realistic 
near-term goal isto make bogs carbon neutral. 

Achieving carbon neutrality for peatlands 
across the globe would have a major impact. 
Last year, Page and her colleagues found that 
by 2015, drained peatlands had emitted about 
80 billion tonnes of carbon dioxide — and that 
this cumulative amount would roughly triple 
by 2100 (ref. 6). Estimates suggest that nations 
will need to limit future carbon-dioxide emis- 
sions to something on the order of 400 billion 
to 1,600 billion tonnes to keep temperatures 
from rising above the Paris target. But peat- 
lands are on track to account for roughly 
10-40% of that budget, unless countries take 
steps to protect and restore these environ- 
ments, according to Page and her colleagues. 

To keep that from happening, says Joosten, 
“all drained peatlands in the world have to be 
re-wetted. No cherry-picking which are easiest, 
cheapest or most effective any more”. Indeed, 
the United Nations Environment Assembly 
adopted its first ever peatland resolution last 
year, urging member states to conserve and 
restore these carbon-rich ecosystems. 

Still, researchers say it will be important to 
document how much carbon is lost or stored 
in different peatlands, so that countries can 
meet their targets for the Paris climate accord 
and future agreements. And basic information 
about peatlands — including their extent and 
depth — is still lacking in many areas. Just three 
years ago, scientists discovered the world’s 
largest continuous tropical peatland in the 
Congo basin of central Africa’. 

“Itis impossible to monitor greenhouse-gas 
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emissions over such large areas directly in 
practice — no country in the world does that,” 
says Joosten. In Indonesia, non-governmen- 
tal organizations have highlighted that there 
is no independent monitoring of re-wetting 
effectiveness, he says. 

And despite efforts to raise the water table 
in large swathes of Indonesia’s peatlands, the 
country faced one of its worst fire seasons in 
2019. “The areas that burnt were sites that were 
restored,” says Lahiru Wijedasa, a peatland 
ecologist at the National University of Singa- 
pore who is studying Indonesia’s peatlands. 


“At what point is peat 
restored? We've spent 
millions and haven't really 
thought through what 
success will look like.” 


“We are at the early stages of understanding 
how these ecosystems function asa whole,” he 
says. The fires call into question whether Indo- 
nesia’s degraded peatlands can be restored 
and how they will respond in the future, says 
Wijedasa. 

Andersen agrees. “If degradation is too 
extensive, are we at risk of losing peatland 
areas before we can do anything about it?” 


Burning questions 


On 12 May 2019, a fire broke out on one of 
Andersen’s restoration sites in Scotland. 
She recalls sleepless nights spent tracking 
the fast-moving blaze as it burnt more than 
50 square kilometres. “It looked apocalyptic 
with an orange sky and dark clouds of smoke,” 
she says. “You could hardly breathe or see.” 
But what was most impressive, she recounts, 
is the speed at which it travelled. “It basically 
covered nearly 15 kilometres in one day.” 

Andersen says that unusually hot, dry condi- 
tions preceded the fire and left the Sphagnum 
moss brittle. “The rivers were the lowest they’ve 
been since 1976.” Serendipitously, a couple of 
the driest sites were part of the InSAR valida- 
tion study. The researchers found that the sur- 
face of the peat that had been most affected 
by the drought had collapsed, and it hadn’t 
recovered when it began to rain again before 
the fire. “We saw consequences that outlast the 
drought for along period of time,” she says. 

Still, the restoration efforts seemed to help. 
Areas that had good Sphagnum cover and 
remained wet despite the drought had only 
low or medium fire damage, compared with 
spots that were still actively drained and had 
only patchy Sphagnum cover, which received 
the deepest burns and damage, according to 
Andersen. 

Three weeks after the fire, she and her 
colleagues submitted a successful grant pro- 
posal to the UK Natural Environment Research 
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Council to study the impact of the blazes. The 
team will use ground measures, images from 
crewless aerial vehicles, and InSAR data to 
compare different types of peatland manage- 
ment — some had been restored more inten- 
sively, whereas others had been left to recover 
with fewer interventions. The researchers will 
assess howseverely the peat burnt in each area, 
how it recovers and how much carbon was lost. 
They have also installed a fifth flux tower in 
the burnt area to measure how the fires affect 
carbon emissions. These data will be useful 
as researchers determine how best to restore 
sites to withstand future climate stresses, says 
Andersen. 

Scotland has several advantages over other 
regions in its quest to restore peatlands — for 
example, landowners in the sparsely popu- 
lated Flow Country canstill makea living from 
restored peatlands, typically through tourism 
related to hunting and fishing. In Indonesia, 
however, people struggle to find crops that 
will grow on wet peaty soils and provide live- 
lihoods for residents. 

I was able to see at first hand some of the 
impacts of Scotland’s efforts last year dur- 
ing a slog through the rain at the Langwell 
and Braemore estate. Roughly 6,000 newly 
installed dams have stymied erosion on 
grounds used for stag hunting and fishing. 
Between the dams, the water has pooled 
and is dotted with iridescent mosses. Anson 
MacAuslan was among the first estate manag- 
ers to secure funding from Peatland Action —a 
project funded by the Scottish government 
to restore peatlands. He has spent roughly 
£185,000 on restoring 7% of the 19,000-hec- 
tare estate. He has already seen direct benefits 
from the dams, which have reduced flooding 
risk and improved water quality inthe streams 
where salmon swim. 

As several of the neighbouring estates start 
their own restoration projects, Andersen says 
that the shift in public perception of peatlands 
has beenakey legacy of the Flow Country res- 
toration project. There is even an effort afoot 
to nominate the Flow Country as a United 
Nations Educational, Scientific and Cultural 
Organization (UNESCO) World Heritage site — 
which would bea first for a peatland. Although 
people used to call this landscape worthless, 
she says, “we don’t hear that any more”. 


Virginia Gewin, a science journalist in 
Portland, Oregon, reported this story with 
support from the European Geosciences 
Union. 
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Acloud of interstellar gas and dust, captured by NASA’s Hubble Space Telescope. 


rian Greene’s Until the End of Time 
sits within a tradition of grand, 
synoptic visions of the Universe, 


From Big Bang ma 
to cosmic bounce cece 


American. Halfway through, I realized why. 
With its scepticism of religion but openness 
to humanistic wonder, awe of nature, cel- 


A physicist and humanist takes us on ebration of the individual and recognition 
‘ cae of the power of physical law, the narrative 
a grand tour of all time. By Philip Ball has a strong whiff of transcendentalism. 


There is an echo of philosopher Henry David 
Thoreau in Greene’s account of lying out at 
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night, enraptured by the aurora borealis. 
And essayist Ralph Waldo Emerson’s declara- 
tion that the “sublime laws play indifferently 
through atoms and galaxies” could almost be 
this book’s epigraph. 

Such qualities lift this work above many 
accounts of the cosmic story spanning from 
the Big Bang to the end of time — whether 
that’s a big rip, heat death or cosmic bounce. 
Greene takes us from quarks to consciousness, 
and from the origin of life to the genesis of 
language. He draws from an impressive range 
of sources, suchas poets William Butler Yeats 
and Sylvia Plath. In attempting to weave in 
the evolution of physical laws with that of 
the human mind and cultures, Greene’s aim 
vaults beyond that of his bestselling 1999 
book, The Elegant Universe. Until the End 
of Time is packed with ideas; whether they 
come together as a convincing story is 
another matter. 

This narrative features humanity as a 
brief moment when matter became self- 
aware. Current physical and cosmological 
theories imply that this state of affairs can’t 
last. Eventually proton decay, a dominance 
of dark energy or thermodynamic heat 
death will doom all matter and thought. 
Greene, however, suggests that intelligent 
beings could eke out their thought processes 
almost indefinitely by gradually slowing 
them to minimize their inevitable thermo- 
dynamic cost. 

He views this extinction of sentience as a 
cosmic tragedy. It’s poignant to seea modern 
physicist, however girded with string theory, 
the general theory of relativity and the equa- 
tions of quantum mechanics, experience 
the same anguish that goaded ancient mon- 
archs to defy mortality by commissioning 
monumental tombs. Greene finds the solace 
that religion typically provides in the idea 
that the “small collection of the universe’s 
particles” that constitutes humanity can 
evolve and “with a flitting burst of activity 
create beauty, establish connection, and 
illuminate mystery”. 

His grand tour is sometimes breathtaking, 
necessarily selective and occasionally super- 
ficial. It often lacks the space or rigour to do 
its vast range of subjects justice. Beyond 
fundamental physics, Greene is a lucid sum- 
marizer of other popular accounts, but little 
more. That can leave his story patchy, and 
even misleading at times. His explanation 
for why water is a special solvent required 
for life attributes it all to the molecule’s 
polar nature — in which case it would not be 
special at all. (Hydrogen bonding is left out, 


and although that does not tell the whole 
story, neglecting it means we get almost 
no story at all.) To explain the origin of 
myths, the book offers a bit of obsolete 
early-twentieth-century anthropology 
from the likes of folklorist James George 
Frazer, that is given a contemporary gloss 
of evolutionary psychology. 

The biggest shortfall is in the account 
of how biology works, which seems to 
be derived largely from physicist Erwin 
Schroédinger’s 1944 book What Is Life? 
and biologist Richard Dawkins’s 1976 The 


“Greene remains wedded 
totheidea that the most 
reductive view has ultimate 
authority.” 


Selfish Gene. Life in Greene’s reckoning is 
all encoded in the genome, and once molec- 
ular replicators appeared on the planet, 
the rest was just evolutionary history. He 
adds that non-equilibrium thermodynam- 
ics can give us a head start: its tendency to 
create spontaneous knots and patterns of 
local order are a stepping stone towards 
life’s organization. But what’s missing — 
foreshadowing a wider lacuna in the book 
— is any sense that intermediate levels of 
that organization, particularly the cell, are 
equally fundamental. 

When it comes to human behaviour — 
creativity, art, story, religion — Greene places 
areductive faith in evolutionary psychology. 
He is probably right to say that many of our 
complex behaviours are underpinned by 
rather basic adaptive impulses, but he doesn’t 
adequately acknowledge how culture shapes 
them. For instance, he supports psychologist 
Steven Pinker’s notorious description of 
music as “auditory cheesecake”. This posits 
that music is enjoyable because it piggy- 
backs on capacities that evolved for other 
reasons, suchas the ability to separate our 
auditory experience into comprehensible 
chunks. This might or might not be true, but 
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to appreciate what music really means, we 
need to consider its cultural, historical and 
social specifics, and not just attribute it to 
“our ancient adaptive sensitivity to sounds 
with elevated information content”. 

Whether in cell biology or a musical 
tradition, asking why any specific feature 
is the way it is demands that we consider 
a causal explanation. And therein lies the 
problem with Greene’s approach: where it 
seeks out cause. 

It’s true that when he enlists physics as the 
underpinning theory of everything (“Life 
is physics orchestrated”), this is not the 
physicist’s standard hubristic claim. Indeed, 
he points out that we need “overlapping 
narratives” for explanations of phenomena 
at different scales of size and complexity, 
from subatomic particles to galaxies, each 
of which must at least be consistent with the 
one below. And Greene acknowledges that 
an account of human behaviour at the level 
of fundamental particles would be pointless. 
But he still implies that causation flows 
upwards through the hierarchy of scales. 
We lack genuine free will, he says, because 
there is no such factor at play among the 
fundamental forces. 

Thus, Greene remains wedded to the idea 
that the most reductive view has ultimate 
authority — that it all comes down to parti- 
cles, entropy and evolution. “Perhaps one day 
we willinvoke a unified theory of particulate 
ingredients to explain the overwhelming 
vision of a Rodin,” he writes. He doesn't 
recognize that in complex systems, new 
properties and causative mechanisms that 
arise at only the higher levels of the hierarchy 
areas real and fundamentalas, say, the strong 
and weak nuclear forces. This is what physics 
Nobel laureate Philip Anderson argued in his 
1972 essay ‘More Is Different’. 

If we accept Anderson’s position, we have 
to call into question the entire programme 
that Greene articulates here. By the time we 
get to, say, the human impulse to create sto- 
ries, are Big Bang cosmology and quantum 
mechanics meaningful parts of the narrative? 
Perhaps, then, by setting out a vision of the 
world as seen by a thoughtful, humanistic 
fundamental physicist, Greene has offered 
not so mucha state-of-play panorama as a 
tour showing where that view works spectac- 
ularly and where it falls short. Itis an eloquent 
invitation to debate. 


Philip Ball is a science writer and author; his 


latest book is How To Grow a Human. 
e-mail: p.ball@btinternet.com 
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Jack Baldwin 


(1938-2020) 


Organic chemist whose rules aided the synthesis of natural products. 


hemistry,” Jack Baldwin once 
said in his direct way, “is about 
making forms of matter that 
have never existed.” Baldwin 
was best known for formulating 
aset of rules that predict how likely it is that 
atoms (mostly carbon) in a synthesis will link 
into rings, a structural feature of many bio- 
logical molecules and drugs. Published injust 
three pages (with a one-sentence abstract) in 
1976 (J. E. Baldwin/. Chem. Soc. Chem. Com- 
mun. 734-736; 1976), Baldwin’s rules have 
been fundamental to organic synthesis in the 
pharmaceutical and agrochemical industries, 
and to understanding biology from a chemical 
perspective. He died on 4 January, aged 81. 

His passions also encompassed finding out 
how nature makes chemicals that researchers 
cannot. This led him to ‘biomimetic’ synthe- 
sis: using the principles of nature to improve 
the generation of biomolecules in the labo- 
ratory. He particularly relished the challenge 
of ‘molecules from Mars’, his term for natural 
products whose biosynthesis was baffling. 

Baldwin’s interest in rings led him to study 
antibiotics that contain a B-lactam ring, the 
best known of which is penicillin. He worked 
initially with Edward Abraham, who had been 
part of the team that developed penicillin 
and who went on to reveal the activity of 
broad-spectrum antibiotics known as cepha- 
losporins. Baldwin uncovered the mechanistic 
basis of the enzyme action that catalyses the 
formation of the two rings at the heart of the 
penicillin molecule. Others have since found 
that related enzymes are involved in many bio- 
logical processes, including how the human 
body responds to low levels of oxygen. 

Baldwin was born in London, and studied 
chemistry at Imperial College London, where 
he also did a PhD. He was supervised by Derek 
Barton, a pioneer of conformational analy- 
sis — the idea that the reactivity ofa molecule 
could predict its preferred 3D shape — who 
later won a Nobel prize. Barton had a major 
impact on Baldwin’s career. 

After four years on the staff at Imperial 
College, Baldwin spent more than a decade 
in the United States, working first at Pennsyl- 
vania State University in State College and then 
at the Massachusetts Institute of Technology 
(MIT) in Cambridge. With an able young team 
and the latest instruments, his MIT period was 
particularly productive. To develop a detailed 
picture of how atoms arrange themselves 
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during organic reactions, he combined 
theoretical and geometric considerations with 
structural information. His team obtained this 
using techniques such as nuclear magnetic 
resonance and X-ray crystallography. At MIT, 
he created a class of biomimetic molecule 
that reversibly binds oxygen when com- 
plexed with iron, just as haemoglobin does in 
the blood, and formulated his rules for ring 
formation. It was also where he met his future 
wife, Christine Franchi, who built a career in 
academic publishing. 

In 1978, Baldwin was recruited to head the 
Dyson Perrins Laboratory at the University 
of Oxford, UK. As only the fourth person to 
hold the chair in organic chemistry since the 
laboratory opened in1916, he transformed his 
discipline at Oxford, interms of bothscientific 
ambition and equipment. Baldwin brought 
with him researchers from his internationally 
diverse lab at MIT, and continued to recruit 
people with a wide range of backgrounds. 

Many of his students, who knew him as 
‘J.E.B., went on to lead research all over the 
world. The output of his lab was prodigious: 
he is an author on at least 700 papers. In 
1988, he became the founding director of the 
Oxford Centre for Molecular Sciences, which 
he headed for 10 years. The centre helped to 
link physical and biological sciences in Oxford. 

The pioneering role of Oxford scientists in 
the extraction, testing and structural analysis 
of penicillin during the 1940s inspired Bald- 
win’s extensive work ontrying to make the drug 
from scratch. His respect for the optimally 
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efficient process by which microbes produce 
the molecule — even now, most penicillin 
antibiotics continue to be produced through 
fermentation — led him further into the field of 
biomimetic synthesis. His favoured approach 
for building complex multi-ring structures 
was to try to mimic nature’s strategy of mak- 
ing a relatively simple linear framework that 
is predisposed to react to give multiple rings 
inasingle step. The ‘molecules from Mars’ he 
made using this approach included unusual 
alkaloids derived from marine sponges and 
rare rainforest plants. 

Baldwin had little time for the academic 
conventions of Oxford: he spoke his mind, 
and could seem pugnacious in scientific 
debate. But his forceful leadership style 
belied a generosity in his treatment of junior 
colleagues. Wholly committed to research, 
he never sought seats on prestigious commit- 
tees, although his distinction brought many 
honours, including a knighthood in 1997. He 
developed links with the chemical industry 
and championedits role in society, encourag- 
ing his students to pursue industrial careers. 
Aside from science, he enjoyed good food, fine 
wine, powerful motorbikes, fast cars and his 
dogs. After he retired in 2005, he continued 
to co-author publications until just months 
before his death. 


Georgina Ferry is a science writer based in 
Oxford, UK. Her books include biographies of 
Dorothy Crowfoot Hodgkin, Max Perutz and 
John Sulston. 
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Adopt acarbon tax to protect 
tropical forests 


Edward B. Barbier, Ricardo Lozano, Carlos Manuel Rodriguez & Sebastian Troéng 


Alevy on fossil fuels can 
support and restore 
ecosystems that help to 
stem climate change. 


eforestation must be stopped in 
tropical countries to tackle the exis- 
tential threats of climate change and 
biodiversity loss. The vast majority 
of Earth’s species are in the tropics; 
forests there have taken in much of the carbon 
added tothe atmosphere by human activities. 
Safeguarding these forests is central to slash- 
ing greenhouse-gas emissions and meeting 
the internationally agreed United Nations 
Sustainable Development Goals (SDGs)'. 
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Sadly, in tropical countries and 
internationally, investments are woefully 
inadequate in conservation, restoration 
and improving land management to protect 
biodiversity and ecosystem services — 
collectively called ‘natural climate solutions’. 

To plug this gap, we urge more countries 
that have tropical forests to adopt a tropical 
carbon tax — in South and Central America, 
Africa, Asia and the Pacific. This is alevy on 
fossil fuels that is invested in natural climate 
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solutions. Sucha policy can reduce the use of 
oil, gas and coal and mobilize domestic funds 
for adaptation and mitigation. 

Costa Rica and Colombia have done this. 
Our own analysis shows that, if12 other coun- 
tries roll out a tropical carbon tax similar to 
Colombia’s, they could raise US$1.8 billion 
each year between them to invest in natural 
habitats that benefit the climate (see Supple- 
mentary Information). 

We call on governments, development 
banks, financial investors and non-governmen- 
tal organizations to support those countries 
that need financial and technical help toimple- 
ment this policy, and to ensure that the money 
raised is spent efficiently and effectively. 


Twin threats 


Almost one-quarter of the emissions caused 
by humans come from agriculture, forestry, 
fibre and livestock production’. It has been 
estimated that tropical deforestation can 
contribute as much to emissions as do some 
large nations (see go.nature.com/37gmwvy). 
If present trends continue, by 2050 the world 
will have lost a further area of tropical forest 
almost the size of India — 289 million hectares‘. 
This could squander half of the remaining 
global carbon budget for limiting warming 
to 1.5 °C above pre-industrial levels*. 

Meanwhile, more than three-quarters of 
species live in the tropics. These are under 
greater threat of extinction than is life else- 
where, mainly because of deforestation’. 

There is a quick, cheap way to halt these 
trends: reducing the conversion of land in 
the tropics, especially of forests, peatlands 
and mangroves. Alongside cuts to fossil-fuel 
emissions, up to 37% of the mitigation needed 
to hold warming to the Paris agreement goal 
(to avoid the catastrophic impacts of climate 
change) might be achieved in this way, at a cost 
of less than $100 per tonne of CO, equivalent’ 
—the standard measure for greenhouse-gas 
emissions. One-third of these mitigation 
options could cost less than $10 per tonne’. 

But ecosystem conservation, restoration 
and management received just 3% of global 
finance for climate mitigation in 2017-18: an 
average of $18 billion®. Most of the remainder 
was spent on renewable-energy generation 
and oninvestments inlow-carbon transport, 
such as railways and electric vehicles®. 

Extra cash is unlikely to come from the 
international community in the near future, 
and aid and other funding is already scarce for 
biodiversity conservation in tropical coun- 
tries”. Such nations urgently need anew way 
to fund natural solutions to climate change. 


Case studies 

Colombia and Costa Rica have blazed a trail. 
Since 1997, Costa Rica has collected a 3.5% tax 
on fossil fuels. That now generates $26.5 mil- 
lion per year’ (see go.nature.com/3jdpmtk; 
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in Spanish). The tax was negotiated in Costa 
Rica’s legislative assembly and supported by 
research from the non-governmental Tropical 
Science Center in San José, which examined 
the benefits of forests to the country’s econ- 
omy. Implementation faced little opposition 
because the tax was incorporated with other 
fiscal reforms. Surveys of fossil-fuel users indi- 
cated that they did not object if revenues were 
directed to forest conservation. 

To invest the money raised, Costa Rica 
created its National Forest Fund (FONAFIFO). 
For example, from 1997 to 2018, the fund paid 


“Investments in protecting 
biodiversity to reduce 
carbon emissions can 
favour poor people.” 


out to landowners across 23.5% of the country 
—anarea of 1.2 million hectares. They spent the 
money on projects to protect 1 million hectares 
of mature forest and 71,000 hectares under 
reforestation. The fund supports conserva- 
tion of mature forests, reforestation using 
native or exotic species, and agroforestry 
systems that use a mix of trees and crops or 
grasslands. It has disbursed $500 million to 
roughly 18,000 people, including those living 
across 162,000 hectares of Indigenous lands, 
such as the Cabécar and Bribri territories. 
Transparency and accountability of the fund’s 
operations are important to its success and 
continued popularity, so strategic and oper- 
ational plans, budgets, financial statements 
and other details are available online (see 
www.fonafifo.go.cr). 

In the 1980s, Costa Rica had the highest 
deforestation rates in the world. Forest cover 
more than doubled between 1986 and 2013, 
rising to 53% (ref. 8). Although estimates 
remain uncertain, we think that the fossil-fuel 
tax, along with a decline in the profitability 
of livestock and the expansion of protected 
areas and ecotourism, contributed to this. The 
programme funded by the fuel tax has been 
especially effective away from protected areas 
and their buffer zones”. 

Colombia rolled out a carbon tax in 2016 as 
part of sweeping fiscal reforms. These garnered 
broad political support because of the need to 
raise money for the country’s peace process. 
The carbon tax was developed by the Ministry 
of Finance and Ministry of Environment and 
Sustainable Development, andis collected from 
companies producing or importing fossil fuels. 

Colombia’s tax of $5 per tonne of emitted car- 
bon yielded revenues of $148 million in 2017 and 
$91 million in 2018 (see go.nature.com/3b8ufkj; 
in Spanish). These go to the Colombian Peace 
Fund (Fondo Colombia en Paz), from which 
25% is used to manage coastal erosion, reduce 
and monitor deforestation, conserve water 
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sources, protect strategic ecosystems and 
combat climate change. A further 5% is used 
to strengthen Colombia’s National System 
of Protected Areas. The revenue will be used 
for conservation projects in the following 
prioritized areas: flood-plain forests, tropical 
montane cloud forests, tropical humid forests, 
tropical savannahs and Andean forests. These 
projects arein the development phase and are 
waiting to access the fund. There is also a pro- 
ject to enhance the Colombian Environmental 
Information System (SIAC), a web-based plat- 
form that provides official information onthe 
state of the country’s natural resources and 
which is under development (see go.nature. 
com/2hthzqw; in Spanish). 

Amechanism called carbon neutrality allows 
companies to reduce their tax burdens by buy- 
ing certified carbon credits from conservation 
and restoration projects in Colombia that 
adhere to internationally recognized stand- 
ards. For example, a company might buy a 
credit ina region that promotes social initi- 
atives with communities that are involved 
in managing these projects. This is the case 
for communities in the Choco departmental 


People in the Democratic Republic of the Congo at a charcoal market — the fuel is one of the causes of deforestation in the country. 
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region of northwestern Colombia, such as 
those living near towns including Acandi, 
EI Carmen del Darién and Baudo. 


Joinin 

Up to 70% of the world’s biodiversity is found 
in just 17 ‘megadiverse’ countries”. Thirteen 
contain tropical forests. In 2018, these coun- 
tries lost almost 7.3 million hectares of forests 
—anarea roughly the size of Panama. Accord- 
ing to our estimates, that represented nearly 
30% of global deforestation and may have 
released about 7% of worldwide carbon emis- 
sions (see Supplementary Information and 
www.globalforestwatch.org/map). 

Two scenarios illustrate how these countries 
could benefit froma tropical carbon tax" (see 
also Supplementary Information). The first 
assumes that each follows a similar policy to 
that of Colombia, introducing a tax of $5 per 
tonne of carbon emitted, and allocating 30% 
of the revenues to natural solutions to climate 
change and measures that conserve forests. 
The second assumes a tax of $15 per tonne of 
carbon emitted and 70% allocation. 

We provide this second option because 


we think that both the urgency and interest 
in addressing climate change and biodiver- 
sity loss will continue to grow. It is also likely 
that some governments will choose to adopt 
sucha higher carbon price and allocate more 
revenues to natural climate solutions. 

For some countries, notably India, the 
Philippines, Mexico, Ecuador and Malaysia, the 
sums raised could provide hundreds of dollars 
per hectare to counter forest loss. The more 
ambitious policy could yield nearly $13 billion 
each year for natural climate solutions. 

Brazil, the Democratic Republic of the 
Congo and Indonesia would benefit the 
most, because they currently have the great- 
est amount of deforestation. Countries that 
have experience in developing high-quality 
carbon-offset projects, such as Peru and 
Ecuador, are well positioned to adopt a trop- 
ical carbon tax (see go.nature.com/2tptk21). 


Politically challenging 

There are three main criticisms of funding 
natural climate solutions through carbon 
taxes. First, that they cause ‘leakage’ — the 
displacement of deforestation to other areas. 
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Second, that they reduce the incentive to 
reduce emissions through renewable energy. 
And third, that the tax revenue should be used 
for other purposes. 

We think that each of these problems can be 
addressed. National tax schemes reduce the 
likelihood of leakage in each country. Renew- 
able-energy production and natural climate 
solutions are both essential, as indicated by 
scenarios from the Intergovernmental Panel 
on Climate Change’. Finally, although there 
are many worthy uses of tax revenue, the 
severity of climate change and biodiversity 
loss means that stemming both at once is 
a development priority for tropical-forest 
countries. 

We also recognize that it can be politically 
challenging to introduce measures that 
increase the cost of living. But as the examples 
in Costa Rica and Colombia illustrate, invest- 
ments in protecting biodiversity to reduce 
carbon emissions can favour poor people 
because such investments have wider social 
benefits beyond landowners and parks’. In 
Costa Rica, forests and high levels of poverty 
can often be found in the same districts, so 
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Acotton-top tamarin (Saguinus oedipus) in one of Colombia‘s protected national parks. 


natural climate solutions. Next week, Costa 
Rica will host a high-level meeting on the sub- 
jectin SanJosé with government and business 
leaders from Peru, Ecuador, Mexico and Chile, 
as well as Colombia. 

And several major international events 
in 2020 provide a platform for supporting 
global action towards a tropical carbon tax. 
These include the International Union for 
Conservation of Nature’s World Conserva- 
tion Congress inJune, the 15th meeting of the 
Conference of the Parties (COP15) to the Con- 
vention on Biological Diversity in Kunming, 
China, in October, and the 26th session of the 
UNFCCC Conference of the Parties (COP26) in 
Glasgow, UK, in November. We suggest that, 
at these meetings, policymakers explicitly 
highlight and incorporate a tropical carbon 
tax in agreements and decisions. 

Tropical deforestation and land-use 
change must be halted to safeguard the 
climate and global biodiversity. The wide- 
spread adoption of atropical carbon tax isa 
practical way forward. 


revenues destined for conservation can also 
contribute to social development. The Costa 
Rican government prioritizes such districts 
for payouts for ecosystem services. It also 
assists smallholder farmers and Indigenous 
communities in submitting requests for 
funds. Today, 40% of beneficiaries in Costa 
Rica are communities that live below the 
poverty line. 

Ecosystem services suchas drinking-water 
supply, food provision and cultural services 
are estimated to contribute between 50% 
and 90% of income and subsistence among 
the rural poor and those who live in forests”. 
Such services can make an important contri- 
bution to ending extreme poverty (SDG 1), 
achieving zero hunger (SDG 2), improving 
health (SDG 3) and meeting many of the other 
14SDGs”. 


International support 


The World Bank, the International Monetary 
Fund (IMF) and other multilateral agencies 
should encourage more countries to adopt 
a tropical carbon tax. The IMF already pro- 
motes carbon taxes as an efficient and fiscally 
responsible way of reducing emissions, 
with revenues being used for much-needed 
public investments in developing countries”. 
The international community can support 
more-widespread adoption of a tropical 
carbon tax in two important ways. 

First, some tropical-forest countries and 
other low-income nations will require extra 
financial assistance because they might 
be unable to raise sufficient funds from 
a carbon tax. For example, if Papua New 
Guinea, Madagascar and the Democratic 
Republic of the Congo adopted Colombia’s 
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approach for combating each hectare of 
forest loss with natural climate solutions, 
they would generate only $23, $3 and $1 
per hectare, respectively (see Supplemen- 
tary Information). Top-up financing could 
come from bilateral assistance, or from the 
Special Climate Change Fund and the Least 
Developed Countries Fund. Both of these 


“Theseverity of climate 
change and biodiversity loss 
means that stemming both 
atonceisa priority.” 


are managed by the Global Environmental 
Facility for the UN Framework Convention 
on Climate Change (UNFCCC). 

Second, many tropical-forest countries 
will require technical support to guide and 
monitor their investments. Countries should 
comply with recognized global quality marks 
suchas the Verified Carbon Standard (https:// 
verra.org/project/vcs-program) and the 
Climate, Community and Biodiversity Stand- 
ard (https://verra.org/project/ccb-program). 
The first is the world’s most widely used 
voluntary programme for mitigating green- 
house-gas emissions. The second identi- 
fies projects that simultaneously address 
climate change, support local communities 
and smallholders, and conserve biodiver- 
sity. Currently, the projects that have been 
validated and verified encompass more than 
10 million hectares, an area the size of Iceland 
(see https://verra.org/project/ccb-program). 

Tropical countries are already showing 
interest in carbon-pricing initiatives and 
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Readers respond 


Correspondence 


Clamp down on 
trade-ban violations 


In response to the deadly 
outbreak of coronavirus 
2019-nCovV (see Nature http:// 
doi.org/dk47; 2019), China has 
temporarily banned the sale of 
wildlife in markets, restaurants 
and online. Given that much 

of this trade is already illegal, 
stricter enforcement and 
prosecution measures are 
needed if the consumption of 
wild animals is to be brought 
under control. 

At present, prosecutions are 
often obstructed because of 
inconsistencies in the naming 
of species (Z.-M. Zhou etal. 
Nature 525, 187; 2015). Online 
trading in low-profile illegal 
wildlife as pets is commonplace 
(Y.-C. Ye et al. Conserv. Sci. Pract. 
http://doi.org/dk49; 2020). And 
the public’s desire for exotic 
wildlife products remains 
undiminished — particularly for 
use in traditional medicines. 
Dodging the law on sucha 
scale is a disaster for global 
biodiversity and animal welfare, 
as well as for human health. 

When, or if, wildlife trade 
is again permitted, it must 
be better scrutinized so 
that stringent hygiene and 
quarantine standards at 
markets can be enforced. 
Advertisements will need to 
include the scientific names 
of species as well as their 
provenance. Supplies from 
licensed captive breeders 
must be properly regulated 
and inspected — a step that 
would also help pin down 
violations of the Convention 
on International Trade in 
Endangered Species of Wild 
Fauna and Flora (CITES). 


Zhao-Min Zhou* China West 
Normal University, Nanchong, 
China. 

zhouzm81@gmail.com 

*On behalf of 4 correspondents 
(see go.nature.com/39iz5hc). 
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The Chinese authorities have imposed a temporary ban on the wild-food trade. 


Total ban on wildlife 
trade could fail 


The Chinese government’s 
temporary ban onthe 
domestic transport and sale 
of wild animals following the 
emergence of coronavirus 
2019-nCoV is welcomed 
by environmental non- 
governmental organizations 
pushing for a permanent ban 
(see go.nature.com/3b9kqcx). 
But China’s cultural demand 
for wildlife items could mean 
that a blanket ban would be 
counterproductive. 

Total bans are controversial 
because they risk fuelling 
an intractable, uncontrolled 
and highly priced illegal 
trade, sustained by the rising 
incomes and social status of 
the country’s growing middle 
class (D. W. S. Challender et al. 
Front. Ecol. Environ. 17,199-200; 
2019). China’s complex culture 
is at the root of its demand for 
exotic wildlife items such as 
pangolin scales, tiger bones 
and rhino horns. Likewise, the 
consumption of game meat is 
regarded as healthy as well as 
an indicator of wealth. Markets 
selling such produce are 
prime candidates for passing 


on new viruses. 

This complex issue needs to 
be managed through initiatives 
that discourage consumption, 
such as wisely directed 
education campaigns that aim 
to discredit engrained cultural 
beliefs. 


Joana Ribeiro* CIBIO-InBIO, 
University of Porto, Portugal. 
joanateixeiraribeiro@gmail.com 
*On behalf of 4 correspondents 
(see go.nature.com/2udauk9) 


Romania: help 
astronomers return 


The political climate seems 

to be improving under the 

new government in Romania, 
but the country’s research 

is still hampered by the 
Romanian Academy’s outdated 
regulations. These discourage 
Romanian citizens who have 
pursued careers abroad from 
returning to many institutes — 
including to the Astronomical 
Institute of the Romanian 
Academy (AIRA) in Bucharest. 
As an astronomer of Romanian 
origin working in Spain, | urge 
the government to persuade the 
Romanian Academy to reform 
its regulations and open up its 


research to its citizens working 
abroad and to scientists from 
the rest of the European Union. 
There are no graduate 
astronomy departments in 
Romanian universities and 
the country has no useful 
observatories. When senior 
astronomers retire, there is no 
one to replace them because the 
bright young astrophysicists 
have all decamped abroad. 
Although the academy 
announced in 2016 that its doors 
are open to EU researchers, jobs 
are advertised only in Romanian. 
Researchers wishing to return 
home must have their foreign 
PhD qualifications validated in 
Romania; they are then graded 
according to their previous 
Romanian employment. 
Foreign candidates and 
citizens who trained abroad are 
excluded from senior research 
positions. For example, a 
high-grade post in astrophysics 
recently went toa home-grown 
researcher from another 
discipline. The academy’s 
arcane rulings must be scrapped 
if Romania is to compete in 
international science. 


Ovidiu Vaduvescu La Palma, 
Canary Islands, Spain. 
ovidiu.vaduvescu@gmail.com 
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Quantum cascade laser 
lives ontheedge 


Sunil Mittal & Edo Waks 


Devices known as quantum cascade lasers produce useful 
terahertz radiation, but are typically highly sensitive to 
fabrication defects. This limitation has now been overcome 
using a property called topological robustness. See p.246 


Electromagnetic waves with frequencies in 
the terahertz range (300 GHz to 10 THz) have 
applications in many areas, from imaging and 
security screening to the atmospheric and 
biological sciences. Semiconductor devices 
called quantum cascade lasers (QCLs) provide 
the most compact and efficient way to gener- 
ate terahertz radiation. In QCLs, electrons 
cascade down in energy through a series of 
discrete quantum energy levels, emitting a 
photonat each step’. But, as with all compact 
semiconducting lasers, QCLs are notoriously 
sensitive to fabrication imperfections, which 
results in device-to-device variability of the 
laser output frequency. Now, on page 246, Zeng 
etal.’ report the realization of a terahertz QCL 
thatis insensitive to such disorder. This achieve- 
ment opens the door for terahertz lasers and 
optoelectronics that have unprecedented 
stability and fabrication reproducibility. 

Lasers use a process known as optical 
feedback to build up light intensity and stimu- 
late electrons to emit photons. Acommon way 
to introduce this feedback uses a structure 
called an optical cavity, which is typically com- 
posed of mirrors that reflect the emitted light 
back into the device. Compact lasers, however, 
use more-complex structures such as photonic 
crystals — materials that have a periodically 
varying refractive index. If this periodicity 
is carefully engineered, photonic crystals 
can be used to reflect light waves of only the 
desired frequency, and so achieve lasing®. But 
this approach is highly sensitive to disorder, 
because any imperfections in the photonic 
crystal cause reflections that result in waves 
of unwanted frequencies. These compete with 
the desired waves, leading to unstable light 
intensity and poor laser efficiency. 

Inthe past few years, ‘topological’ photonic 
structures have emerged as a way to make 


photonic devices that are insensitive to 
disorder. This area of research originated from 
concepts developed in condensed-matter 
physics. Over the past two decades, 
condensed-matter physicists have been able 
to use the mathematical descriptions of sym- 
metries and topology to characterize different 
forms of matter. Of particular relevance to the 
current work are exotic materials known as 
topological insulators*. 

As the name suggests, these materials 
are insulators — that is, they do not conduct 
electricity in their interior. However, they 
host conducting electronic states at their 
boundaries. Such edge states can carry cur- 
rent in only one direction and are therefore 
robust against disorder that would otherwise 
scatter charge carriers. This robustness of 


edge states is a manifestation of the overall 
topological properties of the material. Topo- 
logical insulators are so insensitive to disorder 
that they were previously used to define the 
unit of resistance: the ohm. 

Although topological physics originated in 
the field of electronics, it has begun toinspire 
photonics’. Disorder and scattering are even 
more problematic in optics than in electronics, 
because photons exhibit strong interfer- 
ence effects that can lead to complicated, 
difficult-to-control laser behaviour. Trans- 
lating topological protection into the optical 
domain opens up the possibility of making 
robust optical systems. In particular, topolog- 
ical lasers can emit light ina way that is robust 
against scattering and other consequences 
of imperfections. But previous realizations 
of topological lasers®*® have operated at 
frequencies above the terahertz range. 

Zeng and colleagues overcame this limita- 
tion by incorporating topological protection 
into a QCL. To achieve this, they used a topo- 
logical model known as the valley Hall effect, 
which relies on breaking the spatial-inversion 
symmetry of a crystal lattice’ (its symmetry 
under the combination of a180° rotation and 
a mirror reflection). Specifically, the authors 
used a gallium arsenide—aluminium gallium 
arsenide substrate as the gain material — 
the medium in which light is amplified. This 
substrate contained layered semiconductor 
structures called quantum wells that were 
designed to support quantum cascade lasing. 

Theauthors drilled atriangular lattice ofholes 
in the gain material (Fig. 1). The symmetries of 


Crystal lattice 1 


Terahertz radiation at 
crystal-lattice interface 


Figure 1 | Design of a topological laser. Zeng et al.” have made a laser in which terahertz radiation is emitted 
from the interface between two triangular crystal lattices that consist of quasi-hexagonal holes in a substrate 
material. The crystal lattices are topologically inequivalent because the orientation of the holes is flipped in 
one lattice with respect to the other, and this leads to the emergence of exotic photonic states called edge 
states at the crystal-lattice interface. The topological nature of these edge states renders the laser robust 


against fabrication imperfections. 
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this lattice resulted in the emergence of two 
valleys inthe energy-momentum band struc- 
ture — the relationship between the energy 
and momentum of photons in the material. 
The authors made the holes quasi-hexagonal 
so that they broke the spatial-inversion 
symmetry of the lattice and rendered the two 
valleys topologically inequivalent. This led 
to the formation of topological edge states 
at the interface between two such crystal 
lattices in which the orientation of holes (and 
valleys) was flipped in one lattice with respect 
tothe other. 

Zeng and co-workers used these topological 
edge states to design and make a robust 
ring resonator (a type of optical cavity that 
traps light at certain ‘resonance’ frequen- 
cies) in the form of a triangle (Fig. 1). It is this 
triangular cavity that, along with the light 
amplification from the substrate material, 
forms a topological laser. The laser produces 
light of many frequencies that are separated 
by similar frequency gaps. These frequencies 
correspond tothe resonance frequencies of the 
triangular cavity and fall within the frequency 
range of the QCL gain material. 

The authors measured light emission from 
different points along the perimeter of the 
cavity and discovered that the emission at 
each point had the same resonance frequen- 
cies. This indicates that these waves travelled 
through the length of the cavity, traversing 
the sharp (60°) bends at the corners of the 
triangle. Furthermore, Zeng et al. found that 
the lasing frequencies did not change when 
they introduced defects, in the form of extra 
holes, around the cavity, demonstrating the 
robustness of the QCL. 

Another key feature of this laser is that 
energy is ‘pumped into the device electrically. 
Previous topological lasers®* * have been opti- 
cally pumped, which means that they require 
asecond laser source to drive the topological 
laser to generate light. This pumping scheme 
severely limits practical applications. How- 
ever, similar to many commonly used lasers 
(suchas laser pointers), Zeng and colleagues’ 
QCL can be directly driven by an electrical 
current, allowing it to be powered, in princi- 
ple, by a battery or a wall outlet, rather than 
by another laser. 

Robustness against defects and disorder 
is one defining characteristic of topological 
physics, but another important feature is a 
type of asymmetry called chirality. In particu- 
lar, in the valley Hall effect, the two valleys are 
associated with photons of opposite circular 
polarization in the plane of the material. If 
right-circularly polarized photons travel to the 
left, then left-circularly polarized ones would 
travel to the right. Realizing this chirality rep- 
resents acrucial future step towards terahertz 
topological lasers in which light waves flow 
around aring resonator in only one direction. 
The chirality could be incorporated either by 
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explicitly breaking time-reversal symmetry 
(asymmetry in which reversing the direction 
of light waves is equivalent to running time 
backwards) or by introducing directional light 
amplification in the cavity. 

Zeng and co-workers’ results pave the way 
for studying topology ina previously inacces- 
sible part of the electromagnetic spectrum. 
One area of great interest for future research 
is the application of other topological mod- 
els, such as exotic (higher-order) topological 
insulators, to make robust terahertz lasers that 
have other geometries. For example, these 
lasers could emit light at the corners, rather 
than at the edges, of a triangular cavity. 

Another fascinating prospect is the explora- 
tion of non-Hermitian (open) physical systems 
at terahertz frequencies, in which the presence 
of light amplification and loss can lead to the 
emergence of features such as parity-time 
symmetry (symmetry under the combina- 
tion of a mirror reflection and time reversal) 
and exceptional points (spectral features 
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that correspond to coalescing resonances)”. 
The realization of topological photonics in 
the terahertz range could therefore serve as 
a catalyst for the development of practical 
devices, and also enable a better fundamen- 
tal understanding of topological physics and 
complex (nonlinear) optoelectronics. 
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Long-distance coupling 
ina promiscuous protein 


Thorsten Althoff & Jeff Abramson 


Unlike many sugar-transporting proteins, a transporter in one 
species of malaria parasite can import several types of sugar 
equally effectively, aiding the parasite’s survival. The structure 
of this protein reveals the reason for its versatility. See p.321 


Most cases of malaria are caused by the 
protozoan parasite Plasmodium falciparum’. 
Given that there are more than 400,000 malar- 
ia-associated deaths annually, and that 
P. falciparum is constantly evolving to resist 
pharmacological therapies, opportunities 
for developing drugs that target this organ- 
ism must be continuously explored. A protein 
called the P. falciparum hexose transporter 1 
(PfHT1) has a proclivity for scavenging sugars 
from an infected host’s red blood cells to 
improve the parasite’s chances of survival in 
these cells, and is therefore a drug target. On 
page 321, Qureshi etal.” describe the 3D struc- 
ture of PfHT1, and identify a mechanism that 
couples the docking of a sugar in the PfHT1 
binding site to the process by which sugars 
are gated through the protein. This coupling 
facilitates the protein’s substrate promiscuity 
— that is, its ability to transport a wide range 
of sugar molecules effectively, a feature that 
gives the parasite a distinct survival advantage. 

Transporter proteins shuttle substrate 
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molecules across the otherwise impermeable 
lipid bilayer of the cell membrane. The 
functional and dynamic properties of these 
membrane-embedded proteins are funda- 
mentally related to their 3D structures, 
which are modulated at the atomic level 
over a broad range of timescales. Membrane 
transporters use the alternating-access 
mechanism for gating’, in which access to the 
substrate-binding site switches from one side 
of the membrane to the other (Fig. 1). 

The development of methods for determin- 
ing the structures of membrane proteins inthe 
past few years has produced near-complete 
pictures of the translocation mechanisms 
of several classes of transporter — that is, 
the global rearrangements that the pro- 
teins undergo during translocation cycles 
of substrate binding, transport and release 
have been visualized at atomic resolution. 
Intuitively, the substrate specificity of trans- 
porters has generally been found to depend 
on the amino-acid residues at the binding 
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Figure 1| The alternating-access mechanism. Transporter proteins facilitate the 
passage of substrate molecules across cell membranes. Access to the substrate- 
binding site in the middle of transporters is controlled by two gates (red). a, In 

the outward open state, a pathway from the cell exterior allows substrates into 
the protein. b, In the outward occluded state, a substrate is trapped between the 
gates, but the outward-facing pathway is still present. c, In the fully occluded 

state with a bound substrate, no pathways are available. d, In the inward occluded 


site. The structure of PfHT1 now implies 
that another mechanism affecting substrate 
specificity might be at play. 

Red blood cells infected by P. falciparum 
consume about 100 times more glucose than 
do non-infected cells* because the parasite 
continuously metabolizes sugars from these 
cells to support its growth and replication. 
Because PfHT1 is responsible for transport- 
ing sugars from host cells, it has a crucial role 
in supporting this metabolism. It belongs 
to the well-studied major facilitator super- 
family (MFS) of transporters, which promote 
the diffusion of substrates across the cellular 
membrane. It has the same overall 3D struc- 
ture as the distantly related human GLUT 
transporters>. But whereas these specialize in 
the transport of either D-glucose or D-fructose, 
PfHT1 transports both of these sugars, and 
some others, with comparable efficiency. 

Qureshi et al. resolved the 3D structure of 
PfHT1 in which D-glucose is captured in the 
sugar-binding site, and found that the protein 
was ina fully occluded conformation — thatis, 
the transporter protein completely shielded 
the sugar from the aqueous environments on 
either side of the cell membrane. The structure 
therefore provides a snapshot of the substrate 
during a part of the translocation cycle that 
had not previously been visualized for an MFS 
transporter. 

Armed with their structure, the authors 
carried out extensive transport studies 
to try to work out why PfHT1 has less sub- 
strate selectivity than its human GLUT 
counterparts. They first demonstrated that 
the same set of amino-acid residues in PfHT1 
is required to bind D-glucose and D-fructose. 
They then replaced residues in and around 
the sugar-binding site of PfHT1 by residues 
found in GLUT transporters, but none of 
these mutations conferred GLUT-like selec- 
tivity on the resulting proteins. They thus 
concluded that the unusual lack of selectiv- 
ity of PfHT1 cannot be explained on the basis 


of the sugar-binding residues alone. 

So how can the substrate promiscuity of 
PfHT1 be explained? It has been known since 
the first structures of MFS transporters were 
reported®’ in 2003 that bundles of a-helices 
in the proteins ‘rock’ around the central sub- 
strate-binding site, thereby establishing the 
alternating pathways for substrates through 
the protein: an outward-facing pathway, which 
allows substrates into the transporter fromthe 
cell exterior, and an inward-facing pathway 
that allows substrates to enter the cytoplasm 
(Fig. 1). By considering their structure of 
the fully occluded state of PfHT1 alongside 
structures of other sugar transporters cap- 
tured at different stages in the translocation 
of D-glucose®®, Qureshi et al. were able to 
describe a complete translocation cycle. 

The authors found that, surprisingly, all of 
the sugar-binding residues maintain their ori- 
entations throughout the cycle. This implies 
that the switches from the outward-facing con- 
formation of PfHT1to the fully occluded state, 
and then to the inward-facing conformation, 
are not driven by structural rearrangements at 
the sugar-binding site. Instead, they are driven 
by a previously unknown mechanism. 

Qureshi and co-workers’ analysis of the 
gating mechanism of PfHTI1 revealed inter- 
actions involving hydrophilic amino-acid 
residues in two transmembrane a-helices in 
the occluded state. By contrast, inhuman GLUT 
proteins, the equivalent residues are larger 
and more hydrophobic. Experiments in which 
the authors substituted these gating residues 
in PfHT1 with other residues demonstrated 
that they are crucial for sugar transport. Nota- 
bly, the gating residues are about 15 angstr6ms 
away from the sugar-binding site — a large 
distance. This indicates that the binding 
of a sugar is coupled to remote conforma- 
tional changes associated with gating of the 
transporter, a type of mechanism known as 
allosteric coupling. Thus, the ability of PfHT1, 
unlike its human counterparts, to transport 
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state, a pathway to the cytoplasm has formed, but the gate remains closed. e, In 
the inward open state, substrates can exit to the cytoplasm. Qureshi etal.’ report 
the structure of PfHT1, a sugar transporter from the malaria parasite Plasmodium 
falciparum. They find that the binding of a sugar substrate to the structure shown 
in ais coupled to the gating mechanism, and that the transition from a toc occurs 
much faster than in other sugar transporters. This explains why PfHT1 transports a 
wide range of sugar molecules equally effectively, unlike other sugar transporters. 


many similar substrates results from its 
substrate-driven gating dynamics, which 
allows it to adopt the occluded conformation 
more easily and rapidly. 

The authors also carried out experiments 
to investigate how PfHT1 is inhibited by two 
small-molecule antimalarial drugs (C3361 
and MMV009085). This allowed them to 
identify a hydrophobic pocket in the trans- 
porter that probably facilitates the binding 
of inhibitory drug molecules, and that might 
help to guide the design of new antimalarial 
compounds. However, the most exciting 
finding is the allosteric coupling between 
substrate binding and gating — it suggests 
that substrate recognition in transporters 
can be a consequence of the transporter’s 
conformational dynamics, rather than being 
the result of protein-substrate interactions, 
which underpin the conventional ‘lock and key’ 
model of how molecules interact with their 
biological targets. 
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out of the blue 


Adam Jaffe & Jeffrey R. Long 


Prussian blue analogues are archetypes of coordination solids, 
in which metal ions are bridged by ligands to form extended 
network structures. An analysis reveals a surprising ordering 
of the gaps found in their crystal lattices. See p.256 


The centuries-old pigment Prussian blue and 
its analogues are archetypes of compounds 
known as coordination solids, and have 
had an unparalleled role in advancing our 
understanding of inorganic chemistry and 
materials’”. The wide-ranging structural, 
electronic, magnetic and optical properties 
of Prussian blue analogues (PBAs) have been 
repeatedly leveraged towards applications 
that include energy storage’, catalysis‘, ion 
trapping® and gas storage®. However, stud- 
ying the surprisingly complex atomic-scale 
structures of PBAs remains a long-standing 
challenge. On page 256, Simonov et al.’ 
report that they have successfully grown 
single crystals of PBAs, which have previously 
been notoriously elusive. By coupling X-ray 
measurements of the crystal lattices witha 
simple but effective theoretical model, the 
authors reveal an unexpected ordering of 
vacancies — absent nodes in the lattices that 
correspond to missing metal-anion units. This 
structural insight could enable yet another 
means of adjusting the properties of these 
extraordinary materials. 

Prussian blue (Fe,[Fe(CN),],-14H,O) was 
first reported’ in 1710 and was widely used as 
adeep-blue pigment. The eventual determina- 
tion of its crystal structure greatly expanded 
the conceptual boundaries of inorganic chem- 
istry. X-ray diffraction experiments performed 
on powders’, and later on single crystals”, of 
Prussian blue revealed the parent structure 
shared by all PBAs: a cubic framework in which 
two different types of metal cation act as 
‘nodes’ linked in three dimensions by cyanide 
anion (CN ) ‘struts’ (Fig. 1a). PBAs therefore 
have the general formula M[M’(CN),], inwhich 
Mand MW’ are chemically distinct metal ions; 
the [M’(CN),]*“* complex ion unit (Fig. 1b) is 
known as a hexacyanometallate ion, and car- 
ries either three or four negative charges. The 
study of the PBA parent structure enriched our 
fundamental understanding of the coordina- 
tion chemistry of transition metals (howligand 
molecules or ions bind to transition-metal 
ions such as iron, cobalt and copper), and 
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demonstrated that coordination solids that 
have multidimensional connectivity can act 
as porous framework materials through which 
molecules and ions can move. 

The idealized crystal structures of PBAs 
correspond to the cubic framework described 
above, but belie a hidden degree of complexity 
that is crucial in determining their physical 
properties. The true atomic-scale structures 
contain vacancies corresponding to absent 
hexacyanometallate ions (Fig. 1b), which 
form pores that are typically filled with water 
molecules. The concentration and ordering 
(networking) of vacancies control the path- 
ways through which mass can move within 
the materials, and can therefore tune the 
ability of PBAs to reversibly transport ions 
or small molecules. Insight into how vacancy 
ordering is affected by the chemistry of 
PBAs, or by the conditions used to synthesize 
them, can thus provide guidelines on how to 
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tailor the properties of these compounds for 
applications. 

X-ray-scattering measurements on PBA 
powders, beginning with the early diffraction 
studies on Prussian blue’, yielded structural 
information for these compounds. But the 
random orientation of millions of crystallites 
in powders leads to loss of information that 
is retained if measurements are performed 
on single crystals. To gain this extra insight 
and illuminate vacancy behaviour, Simonov 
etal. sought to produce crystals of a series of 
PBAs that contained different metal-ion com- 
binations. Growing single crystals of PBAs is 
challenging because of the rapidity with which 
microcrystalline powders precipitate when 
solutions of PBA precursors are combined. 
However, the authors found that controlled 
mixing of these solutions over the course of 
weeks produced single crystals suitable for 
X-ray-scattering analysis. 

Simonov and co-workers observed clear 
indicators of non-random ordering of vacan- 
cies in the scattering data for their PBA 
crystals. This ordering depends on each 
crystal’s chemical composition and the con- 
ditions used to crystallize it. To understand the 
diversity of the vacancy networks, the authors 
developed a simple two-part model to simu- 
late vacancy ordering. The model considers 
only the trade-off between the preference of 
these compounds to adopt a uniform vacancy 
distribution, and the preference for lattice 
sites to have a certain local symmetry, yet it 
effectively reproduces the experimental X-ray 
scattering results. 

Notably, the authors’ insights enable the 
vacancy-network architectures of PBAs to be 
predicted by considering only a few factors 
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Figure 1| Vacancies in Prussian blue analogues. a, Compounds knownas Prussian blue analogues (PBAs) 
have the formula M[M’(CN),], where M and M’ are two chemically distinct metal atoms. The idealized crystal 
structure of a PBA is a cubic framework in which M and M ions act as ‘nodes’ connected by cyanide ions (CN ), 
which act as ‘struts’. b, The actual crystal structures contain vacancies — gaps in the lattice that correspond 

to missing [M’(CN),]>* units. Networks of vacancies can form pathways that allow molecules or ions to be 
transported through PBAs, a potentially useful characteristic. Simonov etal.’ have used X-ray measurements 
of single crystals of PBAs and numerical modelling to reveal the hidden order of vacancies in PBAs. 
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that depend on the two model parameters, 
such as the choice of metal, precursor 
concentrations and the temperature of crys- 
tallization. Some networks turn out to have 
relatively direct pathways through which a 
molecule or ion could move, whereas other 
networks’ pathways are more tortuous. By 
selecting PBAs that have direct pathways 
facilitating mass transport, these materials 
canbe optimized for use as battery electrodes, 
catalysts or ion-exchange materials. 

Simonov and colleagues’ work addresses 
a long-standing lack of detailed knowledge 
about the structural vacancies that determine 
the physical properties of Prussian blue and its 
analogues. But numerous challenges remain 
before the predictive potential of their results 
can be fully realized. Although remarkably 
effective, the modelling analysis does not 
consider further possible complexities, such 
as the effects of ionic species that dwellinthe 
PBA pores. Extrapolation of the findings from 
these single-crystal studies to powder sam- 
ples, whichare more technologically relevant, 
will require further challenging experiments 
and enhanced modelling that considers the 
surface structure and chemistry of micro- 
particles. Great care will also be needed to 
work out how each of the variables in a PBA 
synthesis correlate with the resulting vacancy 
ordering and material properties. 

Although these challenges necessitate 
substantial further work, they also represent 
an opportunity to exert even greater con- 
trol over the properties of PBAs, guided bya 
deeper understanding of structure-property 
relationships. Refinement of more-complex 
models will dictate how to take advantage of 
the many variables of a PBA synthesis. Not only 
has this work resulted in new-found control 
over the optimization of PBAs for applications 
in energy storage, ion capture and catalysis, 
but it also represents a platform on which to 
builda similar understanding of other frame- 
work materials, such as zeolites” and metal- 
organic frameworks”, which have their own 
sets of challenges and promising applications. 


Adam Jaffe and Jeffrey R. Long are in the 
Department of Chemistry, University of 
California, Berkeley, Berkeley, California 
94720, USA. J.R.L. is also in the Department 
of Chemical and Biomolecular Engineering, 
University of California, Berkeley, and in 

the Materials Sciences Division, Lawrence 
Berkeley National Laboratory, Berkeley. 
e-mails: adamjaffe@berkeley.edu; 
jrlong@berkeley.edu 


1. Sharpe, A. G. in The Chemistry of Cyano Complexes of 
the Transition Metals (eds Maitlis, P. M., Stone, F. G. A. & 
West, R.) 1-302 (Academic, 1976). 

2. Dunbar, K. R. & Heintz, R. A. in Progress in Inorganic 

Chemistry Vol. 45 (ed. Karlin, K. D.) 283-391 (Wiley, 1996). 

Song, J. et al. J. Am. Chem. Soc. 137, 2658-2664 (2015). 

Kruper, W. J. Jr & Swart, D. J. US patent 4,500,704 (1985). 

Kawamoto, T. et al. Synthesiology Eng. Ed. 9, 139-154 (2016). 


FRO 


6. Kaye, S. S. & Long, J. R. J. Am. Chem. Soc. 127, 6506-6507 
(2005). 

7. Simonov, A. et al. Nature 578, 256-260 (2020). 

8. Frisch, J. L. Miscellanea Berolinensia ad incrementum 
scientiarum 1, 377-378 (1710). 

9. Keggin, J. F. & Miles, F. D. Nature 137, 577-578 (1936). 


Neurodegeneration 


10. Buser, H. J., Schwarzenbach, D., Petter, W. & Ludi, A. 
Inorg. Chem. 16, 2704-2710 (1977). 

11. Baerlocher, C. et al. Nature Mater. 7, 631-635 
(2008). 

12. Trickett, C. A. et al. Angew. Chem. Int. Edn 54, 11162-11167 
(2015). 


Aprotein’s structure 
used to diagnose disease 


Juan Atilio Gerez & Roland Riek 


Parkinson’s disease and multiple system atrophy involve the 
protein a-synuclein. Proof that aggregated a-synuclein adopts 
a different structure in each case suggests that its conformation 
underlies the distinct disorders. See p.273 


A snowflake begins life as a tiny crystal that 
acts as aseed on which water molecules aggre- 
gate, increasing the size of the snowflake as 
it descends to earth. Proteins can also act as 
seeds — for instance, in a class of age-related 
disorders called amyloid diseases, in which 
thousands of copies of atype of protein known 
as an amyloid adopt an abnormal structure and 
aggregate in harmful clumps. In Parkinson’s 
disease, aggregates of the amyloid protein 
a-synuclein accumulate in neurons. A rarer 
neurodegenerative disease, multiple system 
atrophy (MSA), involves a-synuclein aggre- 
gates in neuron-supporting cells called glia. 
It can be difficult to distinguish between the 
two disorders, given their overlapping symp- 
toms, but they require different treatments. 
Shahnawaz et al.’ provide an explanation for 
this difference on page 273: like two dissim- 
ilar snowflakes composed of identical water 
molecules, a-synuclein aggregates form dis- 
tinct 3D architectures in each disease. 

In vitro and animal experiments have pre- 
viously indicated that different aggregate 
structures of a-synuclein, called strains, yield 
different effects”. The various a-synuclein 
strains not only can have distinct cell-killing 
abilities and different seeding and propaga- 
tion properties, but also can target different 
cell types and areas of the mammalian brain**. 

Shahnawaz et al. built on these previous 
findings using a technique called protein 
misfolding cyclic amplification (PMCA), 
which amplifies small amounts of a-synuclein 
aggregate, allowing thorough examination 
of minuscule samples. An amyloid-specific 
fluorescent dye is incorporated into the newly 
formed aggregates, enabling their analysis. 

Impressively, the authors amplified and 
analysed samples from the cerebrospinal 
fluid of more than 200 people who had 
either Parkinson’s disease or MSA, or who 
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were healthy (Fig. 1). They found that samples 
taken from people with Parkinson’s disease 
displayed more fluorescence than those from 
people with MSA. Thus, PMCA could be used 
to discriminate between Parkinson’s disease 
and MSA. 

The different levels of fluorescence 
suggested that the amyloid dye interacted with 
each a-synuclein aggregate differently, and 
that distinct a-synuclein strains are involved 
inthe two diseases. The authors confirmed this 
result by showing that the two strains could 
also be distinguished by using proteinase K 
digestion (an enzymatic treatment that breaks 
downstrains that have different structures in 
different ways), and through other biophysi- 
cal characterizations, including amicroscopy 
approach called cryo-electron tomography. 

Shahnawaz and colleagues’ work has two 
major implications. First, it demonstrates 
that PMCA can be used as a diagnostic tool 
to discriminate between diseases involving 
a-synuclein. However, it should be noted 
that the samples analysed in this study were 
obtained from people who had already been 
diagnosed, and it remains unclear whether 
the approach could be used as a predictive 
tool to detect disease at earlier stages. More- 
over, itis possible that PMCA is affected by the 
medication given tothe participants who had 
Parkinson’s disease. These people typically 
receive the hormone dopamine (L-dopa), 
which has been shown to affect a-synuclein 
aggregation invitro’. 

Second, the study adds to a growing body 
of evidence supporting the ‘one polymorph, 
one disease’ hypothesis® °, which states that 
different structural forms (polymorphs) ofthe 
same aggregated protein can cause distinct 
pathologies and symptoms. What might 
lead a protein to adopt different structures? 
In vitro, distinct fold structures can result 
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Figure 1 | Different structures for the a-synuclein protein. Two neurodegenerative disorders, 
Parkinson’s disease and multiple system atrophy (MSA), involve aggregates of a-synuclein, which are 
found in neurons and neuron-supporting glial cells, respectively. Shahnawaz et al.'have demonstrated that 
a-synuclein adopts different structures in each disease, indicating that the structure of the protein might 
contribute to the distinct nature of each disorder. The group extracted tiny amounts of a-synuclein from 
cerebrospinal fluid (CSF) samples. Protein amplification and analyses revealed different structures for 

the two samples. These analyses were sufficient to discriminate between the diseases in around 95% of the 


200 people studied. 


from distinct environmental conditions. For 
example, different a-synuclein polymorphs 
arise depending on whether the protein is kept 
ina phosphate-containing or phosphate-free 
buffer’. Jn vivo, a-synuclein is exposed to sev- 
eral environments. Indeed, the neurons that 
degenerate in Parkinson’s disease and the 
glia affected in MSA belong to different cell 
lineages, and have markedly different intra- 
cellular environments. In addition, a-synuclein 
can move between cells, exposing it to both 
intra- and extracellular environments’. 

The idea of different polymorphs in disease 
dates back to studies of prion proteins’ in the 
1990s. Much like amyloids, prions aggregate 
in harmful infectious clumps to cause neuro- 
degenerative conditions suchas Creutzfeldt- 
Jakob disease in humans and scrapie in sheep. 
Several strains of prion, each adopting a 
different polymorph, typically coexist in 
agiven sample or organism’. The strains have 
different fitnesses in different environments, 
which governs their ability to replicate’ — a 
phenomenon knownas the prion cloud”. 

Acorollary of this idea is that if environmen- 
tal conditions change, the relative abundance 
of each polymorph might change. This princi- 
ple also governs the PMCA assay. Under given 
conditions, the fittest polymorphs should be 
amplified from a possible mix of pre-existing 
strains. Indeed, in Shahnawazand colleagues’ 
experiments, a single distinct polymorph was 
amplified from Parkinson’s disease samples 
and another from MSA samples. 

By contrast, in another recent study that 
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used PMCA, Strohaker and colleagues” 
reported no significant differences between 
structures of a-synuclein derived from the 
brains of people who had Parkinson’s disease 
and those with from people with MSA. A possi- 
ble explanation for this apparent discrepancy 
is that the two groups used different PMCA 


Medical research 


protocols. In addition, Strohaker et al. used 
a much smaller group of patients than did 
Shahnawaz and colleagues. In fact, analysis 
using nuclear magnetic resonance spectros- 
copy did indicate distinct structural featuresin 
a subset of Strohaker and colleagues’ samples. 

High-resolution cryo-electron microscopy 
has been used to demonstrate the existence 
of distinct disease-specific polymorphs of 
another neurodegeneration-associated 
protein, tau, at atomic resolution®. A similar 
approach using samples extracted under mild 
conditions might give us a clearer picture of 
the reality for a-synuclein. Taken together with 
similar observations for Alzheimer’s disease”, 
our understanding of the structural landscape 
of amyloid diseases is broadening. 
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Smoke signals in the 
DNA of normal lung cells 


Gerd P. Pfeifer 


Healthy cells in smokers’ lungs have a high burden of 
mutations, similar to the mutational profile of lung cancer. 
Surprisingly, ex-smokers’ lungs have a large fraction of healthy 
cells with nearly normal profiles. See p.266 


According to the World Health Organization, 
there are 1.1billion smokers worldwide and an 
estimated 1.8 million deaths from lung cancer 
annually. Lung cancer caused by smoking can 
take decades to arise, and smokers have up 
to a 30-fold higher risk of developing the 
disease than do non-smokers’. Carcinogenic 
components of tobacco smoke promote lung 
cancer by causing DNA damage that can lead 
to mutations through known mechanisms, 
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but what the initial consequences of smoking 
are for healthy lung cells is poorly under- 
stood. On page 266, Yoshida et al. report the 
mutational profiles of 632 healthy lung cells 
obtained from whole-genome sequencing of 
biopsied tissue from 16 individuals: children, 
adults, non-smokers, current smokers and 
ex-smokers. The authors analysed the fre- 
quency and properties of the mutations 
present, how they differed according to age 


and smoking status, and howthese mutations 
related to those found ina type of lung cancer 
called squamous-cell carcinoma. 

The authors dissociated cells from lung 
tissue (Fig. 1) and isolated a type of epithelial 
cell called a basal cell (which can self-renew). 
Growing single cells into cellular colonies 
allowed the authors to determine the DNA 
sequence of the given original cell. A poten- 
tial caveat of the study is that, although the 
authors obtained the genome sequences of 
hundreds of single cells, the number of indi- 
viduals with each different smoking status was 
relatively small. The authors report that the 
number of single nucleotide (point) mutations 
increased with age — for each extra year of 
life, about 22 additional such mutations were 
found per cell. 

However, being a former smoker added 
another 2,330, and being a current smoker 
added 5,300 point mutations per cell on 
average, confirming the mutational potency 
of smoking. Smokers’ genomes also had exten- 
sive examples of other types of alteration, 
suchas insertion or deletion mutations. The 
number of mutations in different cells from 
the same individual could vary by tenfold in 
smokers, a much higher variability than was 
found in non-smokers. The stage of the cell 
cycle at which a cell is exposed to carcino- 
genic agents might affect how effectively DNA 
damage is repaired before DNA replication, 
which could offer an explanation for this high 
variability. 

Yoshida and colleagues examined the 
mutations in individual cells using previously 
developed algorithms to focus onall the types 
of sequence alteration possible (for example, 
mutation of the DNA base adenine to cytosine, 
guanine or thymine) and also to assess the 
bases on either side of a mutated base. Such 
analysis identifies specific patterns (muta- 
tional signatures) that have been used before 
to characterize the genomes of tumour cells’. 

The authors report that the presence of 
certain mutational signatures increased 
with age and did not seem to be affected by 
smoking. These included a signature attrib- 
uted to natural processes whereby the loss 
of an amino group in a modified cytosine 
(termed 5-methylcytosine) changes the 
base to a thymine. The most common muta- 
tional signature in all the samples was one 
that is rich in cytosine-to-thymine and thy- 
mine-to-cytosine mutations. The presence 
of this signature increased with age and was 
more common in people with a history of 
smoking. The underlying processes driving 
these mutations are unknown. The most 
common smoking-dependent signature 
consisted of guanine-to-thymine mutations, 
a signature that is characteristic of most 
smoking-associated lung cancers*”. 

Lung cancers have some of the highest 
mutation frequencies of all tumour types®; 
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Figure 1 | Mutational burdens in normal human lung cells. Yoshida et al.? analysed the pattern of 
mutations in healthy lung tissue in non-smokers, current smokers and ex-smokers. a, Using biopsied lung 
tissue, the authors determined whole-genome sequences corresponding to single cells. b, The cells of the 
non-smoking individuals had few mutations. By contrast, current smokers had a high proportion of cells with 
a large number of mutations (grey; darker colour indicates more mutations), and many of these mutations 
were of a type predominantly found in smokers. Compared with non-smokers, smokers also had greater 
variability in the mutational load between the different cells of a given individual. Surprisingly, the authors 
found that five out of six ex-smokers had a substantial fraction (20-50%) of cells that had low numbers of 
mutations and had hardly any smoking-associated mutational signatures. How these cells arise is a mystery — 
Yoshida et al. speculate that they are generated from a population of as-yet-unknown stem cells. 


however, it is thought that only a small number 
of tumour-promoting (driver) mutations need 
to occur ina single cell to kick off malignant 
growth. Given the high mutational burden and 
the specific smoking-associated mutational 
signatures found in smokers’ healthy epith- 
elial cells, Yoshida and colleagues examined 
whether these mutations affected crucial 
genes that are relevant for cancer growth. 

Indeed, they found cells that had acquired 
mutations in genes, including 7P53 and 
NOTCH1, that are driver mutations in 
squamous-cell carcinomas. These driver muta- 
tions were more common in the lung cells of 
smokers than in those of non-smokers. Some 
cells even had as many as three driver muta- 
tions. However, we do not know how many of 
these mutations (and in what combination) are 
required for human lung cancer to develop. 
Specific TP53 mutations were found in multi- 
ple cells from the same individual, suggesting 
that these mutations occur early, that cells 
with the mutation proliferate, or both — simi- 
lar to what has been observed for sun-exposed 
healthy human skin’. 

The higher risk of lung cancer in 
ex-smokers compared with non-smokers is 
reflected in their high mutation burden and 
the signature of smoking-associated muta- 
tions in most of their lung cells (similar to the 
cellular profile of current smokers). Although 
ex-smokers have a high risk of developing lung 
cancer, their risk is reduced compared with 
that of current smokers, and this lowering 
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depends on the length of time of smoking 
cessation’. Why this is the case has been hard to 
explain. However, perhaps the most surprising 
result of Yoshida and colleagues’ work might 
offer a clue: in 5 out of 6 ex-smokers, 20-50% 
of the cells had alow mutation burden that was 
similar to the profile of non-smokers of the 
same age range (Fig. 1). 

These near-normal cells in ex-smokers 
had a low frequency of smoking-dependent 
mutational signatures. Moreover, compared 
with the ex-smokers’ highly mutated cells, 
these near-normal cells had longer versions 
of DNAstructures called telomeres, whichare 
found at the ends of chromosomes. Telomere 
length shortens with each cell division; thus, 
long telomeres suggest that these cells had 
not undergone many divisions. The authors 
speculate that these cells might have arisen 
comparatively recently from divisions of pro- 
posed previously dormant (quiescent) stem 
cells. However, whether such cells exist in 
human lungs is unknown. 

DNA damage can generate a mutation 
during DNA replication. Therefore, if a popu- 
lation of non-dividing stem cells exists in the 
human lung, even if exposed to carcinogenic 
agents, perhaps such cells might avoid incur- 
ring mutations if DNA damage is eventually 
repaired in the absence of division. But the lack 
of knowledge about these proposed long-lived 
stem cells and information about the longevity 
of the different cell types in the human lung 
make it difficult to explain what occurred in 
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these ex-smokers’ cells with few mutations. 

Why do ex-smokers still have a substan- 
tial fraction of highly mutated cells that can 
proliferate, at least when grown in vitro? Any 
short-lived cells that were exposed to 
carcinogens during their proliferation should 
have vanished many years after the cessa- 
tion of smoking. This raises the question of 
whether there are long-lived differentiated 
cells in the lung that carry a high mutational 
burden, and whether these cells can resume 
proliferation, perhaps because of the plasticity 
(the ability to change cellular identity) of lung 
cells’®. A future challenge will be to understand 
the cell biology of the mechanisms under- 
lying these observations. Perhaps one day it 
will be possible to develop ways to boost the 
population of lung cells with few mutations 
in ex-smokers. 

Yoshida and colleagues’ study has broad- 
ened our understanding of the effects of 
tobacco smoke on normal epithelial cells in 


the human lung. It has shed light on how the 
protective effect of smoking cessation plays 
out at the molecular level in human lung tissue 
and raises many interesting questions worthy 
of future investigation. 


Gerd P. Pfeifer is at the Center for Epigenetics, 
Van Andel Institute, Grand Rapids, 

Michigan 49503, USA. 

e-mail: gerd.pfeifer@vai.org 
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Forum: Mental health 


Digital technology 


under scrutiny 


Does time spent using digital technology and social media 
have an adverse effect on mental health, especially that of 
adolescents? Here, two scientists discuss the question, and 
how digital devices might be used to improve well-being. 


The topic in brief 


* Thereis an ongoing debate about 
whether social media and the use of 
digital devices are detrimental to mental 
health. 

¢ Adolescents tend to be heavy users 
of these devices, and especially of social 
media. 

* Rates of teenage depression began to 
rise around 2012, when adolescent use of 


social media became common (Fig. 1). 

* Some evidence indicates that frequent 
users of social media have higher rates 
of depression and anxiety than do light 
users. 

* But perhaps digital devices could 
provide a way of gathering data about 
mental health in a systematic way, and 
make interventions more timely. 


Jonathan Haidt 
A guilty verdict 


A sudden increase in the rates of depression, 
anxiety and self-harm was seen in adolescents 
— particularly girls — in the United States and 
the United Kingdom around 2012 or 2013 
(see go.nature.com/2up38hw). Only one sus- 
pect was in the right place at the right time to 
account for this sudden change: social media. 
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Its use by teenagers increased most quickly 
between 2009 and 2011, by which point two- 
thirds of 15-17-year-olds were using it ona daily 
basis’. Some researchers defend social media, 
arguing that there is only circumstantial evi- 
dence for its role in mental-health problems””. 
And, indeed, several studies”? show that there 
is only asmall correlation between time spent 
onscreens and bad mental-health outcomes. 
However, I present three arguments against 
this defence. 
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First, the papers that report small or null 
effects usually focus on ‘screen time’, but it 
is not films or video chats with friends that 
damage mental health. When research papers 
allow us to zoom in on social media, rather 
than looking at screen time as a whole, the cor- 
relations with depression are larger, and they 
are larger still when we look specifically at girls 
(go.nature.com/2u74der). The sex difference 
is robust, and there are several likely causes for 
it. Girls use social media much more than do 
boys (who, in turn, spend more of their time 
gaming). And, for girls more than boys, social 
life and status tend to revolve around intimacy 
and inclusion versus exclusion‘, making them 
more vulnerable to both the ‘fear of missing 
out’ and the relational aggression that social 
media facilitates. 

Second, although correlational studies can 
provide only circumstantial evidence, most 
of the experiments published in recent years 
have found evidence of causation (go.nature. 
com/2u74der). In these studies, people are 
randomly assigned to groups that are asked 
to continue using social media or to reduce 
their use substantially. After a few weeks, 
people who reduce their use generally report 
an improvement in mood or a reduction in 
loneliness or symptoms of depression. 

Third, many researchers seem to be think- 
ing about social media as if it were sugar: safe 
in small to moderate quantities, and harmful 
only if teenagers consume large quantities. 
But, unlike sugar, social media does not act 
just on those who consume it. It has radically 
transformed the nature of peer relationships, 
family relationships and daily activities>. 
When most of the 11-year-olds inaclass are on 
Instagram (as was the casein my son’s school), 
there can be pervasive effects on everyone. 
Children who opt out can find themselves iso- 
lated. A simple dose-response model cannot 
capture the full effects of social media, yet 
nearly all of the debate among researchers so 
far has been over the size of the dose-response 
effect. To cite just one suggestive finding of 
what lies beyond that model: network effects 
for depression and anxiety are large, and bad 
mental health spreads more contagiously 
between women than between men’. 

In conclusion, digital media in general 
undoubtedly has many beneficial uses, includ- 
ing the treatment of mental illness. But if you 
focus on social media, you'll find stronger evi- 
dence of harm, and less exculpatory evidence, 
especially for its millions of under-age users. 

What should we do while researchers hash 
out the meaning of these conflicting find- 
ings? | would urge a focus on middle schools 
(roughly 11-13-year-olds in the United States), 
both for researchers and policymakers. Any 
US state could quickly conduct an informa- 
tive experiment beginning this September: 
randomly assign a portion of school districts 
to ban smartphone access for students in 


middle school, while strongly encouraging 
parents to prevent their children from opening 
social-media accounts until they begin high 
school (at around 14). Within 2 years, we would 
know whether the policy reversed the other- 
wise steady rise of mental-health problems 
among middle-school students, and whether 
it also improved classroom dynamics (as rated 
by teachers) and test scores. Such system-wide 
and cross-school interventions would be an 
excellent way to study the emergent effects 
of social media on the social lives and mental 
health of today’s adolescents. 


Jonathan Haidt is at New York University Stern 
School of Business, New York, New York 10012, 
USA. 


Nick Allen 
Use digital technology 
to our advantage 


It is appealing to condemn social media out 
of hand on the basis of the — generally rather 
poor-quality and inconsistent — evidence 
suggesting that its use is associated with 
mental-health problems’. But focusing only on 
its potential harmful effects is comparable to 
proposing that the only question to ask about 
cars is whether people can die driving them. 
The harmful effects might be real, but they 
don’t tell the full story. The task of research 
should be to understand what patterns of 
digital-device and social-media use can lead 
to beneficial versus harmful effects’, and to 
inform evidence-based approaches to policy, 
education and regulation. 

Long-standing problems have hampered 
our efforts to improve access to, and the 
quality of, mental-health services and sup- 
port. Digital technology has the potential 
to address some of these challenges. For 
instance, consider the challenges associated 
with collecting data on human behaviour. 
Assessment in mental-health care and research 
relies almost exclusively on self-reporting, but 
the resulting data are subjective and burden- 
some to collect. Asa result, assessments are 
conducted so infrequently that they do not 
provide insights into the temporal dynamics 
of symptoms, which can be crucial for both 
diagnosis and treatment planning. 

By contrast, mobile phones and other Inter- 
net-connected devices provide an opportunity 
to continuously collect objective informa- 
tion on behaviour in the context of people’s 
real lives, generating a rich data set that can 
provide insight into the extent and timing of 
mental-health needs in individuals®”. By build- 
ing apps that can track our digital exhaust 
(the data generated by our everyday digital 
lives, including our social-media use), we 
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Figure 1| Depression on the rise. Rates of depression among teenagers in the United States have increased 
steadily since 2012. Rates are higher and are increasing more rapidly for girls than for boys. Some researchers 
think that social media is the cause of this increase, whereas others see social media as a way of tackling it. 
(Data taken from the US National Survey on Drug Use and Health, Table 11.2b; go.nature.com/3ayjaww) 


can gain insights into aspects of behaviour 
that are well-established building blocks of 
mental health and illness, suchas mood, social 
communication, sleep and physical activity. 

These data can, inturn, be used to empower 
individuals, by giving them actionable insights 
into patterns of behaviour that might other- 
wise have remained unseen. For example, 
subtle shifts in patterns of sleep or social com- 
munication can provide early warning signs 
of deteriorating mental health. Data on these 
patterns can be used to alert people to the 
need for self-management before the patterns 
— and the associated symptoms — become 
more severe. Individuals can also choose to 
share these data with health professionals 
or researchers. For instance, in the Our Data 
Helps initiative (https://ourdatahelps.org), 
individuals who have experienced a suicidal 
crisis, or the relatives of those who have died 
by suicide, can donate their digital data to 
research into suicide risk. 

Because mobile devices are ever-present 
in people’s lives, they offer an opportunity to 
provide interventions that are timely, person- 
alized and scalable. Currently, mental-health 
services are mainly provided through a 
century-old model in which they are made 
available at times chosen by the mental-health 
practitioner, rather than at the person’s time of 
greatest need. But Internet-connected devices 
are facilitating the development of a wave of 
‘just-in-time’ interventions” for mental-health 
care and support. 

A compelling example of these interven- 
tions involves short-term risk for suicide’™ 
— for which early detection could save many 
lives. Most of the effective approaches to 
suicide prevention work by interrupting 
suicidal actions and supporting alternative 
methods of coping at the moment of greatest 
risk. If these moments can be detected in an 
individual’s digital exhaust, a wide range of 
intervention options become available, from 
providing information about coping skills 
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and social support, to the initiation of crisis 
responses. So far, just-in-time approaches 
have been applied mainly to behaviours such 
as eating or substance abuse®. But with the 
development of an appropriate research base, 
these approaches have the potential to pro- 
videa major advance in our ability to respond 
to, and prevent, mental-health crises. 

These advantages are particularly relevant 
to teenagers. Because of their extensive use 
of digital devices, adolescents are especially 
vulnerable to the devices’ risks and burdens. 
And, given the increases in mental-health 
problems in this age group, teens would 
also benefit most from improvements in 
mental-health prevention and treatment. If we 
use the social and data-gathering functions of 
Internet-connected devices in the right ways, 
we might achieve breakthroughs in our ability 
to improve mental health and well-being. 


Nick Allen is at the Center for Digital Mental 
Health, University of Oregon, Eugene, Oregon 
97403-1227, USA. 

e-mail: nallen3@uoregon.edu 
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Genome editing, which involves the precise manipulation of cellular DNA sequences 
to alter cell fates and organism traits, has the potential to both improve our 


understanding of human genetics and cure genetic disease. Here I discuss the 
scientific, technical and ethical aspects of using CRISPR (clustered regularly 
interspaced short palindromic repeats) technology for therapeutic applications in 
humans, focusing on specific examples that highlight both opportunities and 
challenges. Genome editing is—or will soon be—in the clinic for several diseases, with 
more applications under development. The rapid pace of the field demands active 
efforts to ensure that this breakthrough technology is used responsibly to treat, cure 
and prevent genetic disease. 


Inthe nearly seventy years since the discovery of the DNA double helix, 
technologies have advanced for the determination, analysis and altera- 
tion of genome sequences and gene-expression patterns in cells and 
organisms. These molecular tools are the foundation of molecular 
biology, driving the therapeutic industry by increasing the understand- 
ing of the genetics of normal and disease traits. The ability to diagnose 
genetic diseases has developed rapidly with reductions in the costs 
of genome sequencing, increases in comparative analyses of human 
genome sequences and increased applications of high-throughput 
genomic screening. However, the dearth of therapies, much less cures, 
for genetic diseases has created a growing separation between diagnos- 
tics and treatments, underscoring the urgent need to develop thera- 
peutic options. Mitigation or correction of disease-causing mutations 
is a tantalizing goal with tremendous potential to save and improve 
lives, representing a convergence of technical and medical advances 
that could eventually eradicate many genetic diseases. 

Although methods for genome engineering and gene therapy have 
been of interest for decades, the development of engineered and pro- 
grammable enzymes for the manipulation of DNA sequences has driven 
a biotechnological revolution'>. In particular, fundamental research 
showing how CRISPR and CRISPR-associated (Cas) proteins provide 
microorganisms with adaptive immunity has propelled transforma- 
tive technological opportunities enabled by RNA-guided proteins. 
CRISPR-Cas9 and related enzymes have been used to manipulate the 
genomes of cultured and primary cells, animals and plants, vastly accel- 
erating the pace of fundamental research and enabling breakthroughs 
in agriculture and synthetic biology °. Building on past gene therapy 
efforts”, we are entering an era in which genome-editing tools will be 
used to inactivate or correct disease-causing genes in patients, offering 
life-saving cures to people who have genetic disorders. 

In this Review, I discuss the therapeutic opportunities of genome 
editing, the ability to alter the DNA in cells and tissues ina site-specific 
manner. In addition to presenting current capabilities and limitations 
of the technology, I also describe what it will take to apply therapeutic 
genome editing in the real world. A comparison of somatic-cell and 


germline editing highlights the importance of open public discussion 
about, and regulation of, this powerful technology. 


The scope of genome-editing applications 


Although the genetics of human disease are often complex, some of the 
most common genetic disorders stem from mutations ina single gene. 
Cystic fibrosis, Huntington’s chorea, Duchenne muscular dystrophy 
(DMD) and sickle cell anaemia each represent diseases that result from 
defects in only one gene inthe human genome; such monogenic diseases, 
of which more than 5,000 are known, affect at least 250 millionindividu- 
als globally. DNA sequencing of affected families has provided detailed 
information about the mutations that lead to each disorder, as well as 
correlations between specific genetic changes (genotype) and disease 
severity. These data in turn reveal DNA sequence alterations or correc- 
tions that could provide a genetic cure by either disrupting the function 
ofatoxic or inhibitory gene or restoring the function of an essential gene. 

Sickle cell disease and muscular dystrophy, two common human 
genetic disorders, provide instructive examples of diseases that could 
be treated or cured by genome editing in the foreseeable future. Sickle 
cell disease results from a single base-pair change in the DNA that in 
turn generates a defective protein with destructive consequences in 
red blood cells. DMD belongs to a set of muscle-wasting diseases that 
result from DNA sequence changes that disrupt the normal production 
ofa protein required for muscle strength and stability. A closer look at 
each of these diseases illustrates the ways that genome editing could 
offer therapeutic benefit to patients. 

Sickle cell disease occurs in individuals who have two defective copies 
of the gene that encodes $-globin (HBB), the protein required to form 
oxygen-carrying haemoglobin in adult blood cells. Described originally 
by Linus Pauling and colleagues" and mapped toa genetic locus inthe 
1950s”, a single A-to-T mutation results in a glutamate-to-valine sub- 
stitution in B-globin (Fig. 1). This seemingly small change causes the 
defective protein to form chain-like polymers of haemoglobin, inducing 
red blood cells to assume a sickled shape that leads to occluded blood 
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Fig. 1| Ex vivo and in vivo genome editing to treat human disease. 

a, b, Somatic genome-editing treatments may be accomplished in one of two 
ways: by removing and editing target cells inthe laboratory before returning 
them to the patient (ex vivo, a) or by directly delivering CRISPR-Cas editing 
tools to the affected tissue (in vivo, b). a, Blood disorders suchas sickle cell 
disease may be treated by editing haematopoietic stem or progenitor cells 
(HSPCs) ex vivo, creating normal red blood cells (RBCs). b, Disorders that 
affect non-removable tissues, suchas DMD, require editing of affected cell 
types (inthis case myogenic cells) in vivo. 


vessels, pain and life-threatening organ failure. Although bone-marrow 
transplantation can cure the disease, it requires the use of cells from 
an individual whose immune profile matches that of the patient. In 
principle, sickle cell disease could be cured by removing blood stem 
cells—that is, haematopoietic progenitor cells—from a patient and 
using genome editing to either correct the disease-causing mutationin 
6-globin or activate expression of y-globin, a fetal form of haemoglobin 
that could substitute for defective B-globin (Fig. 1). The edited stem cells 
could then be transplanted back into the patient, in whom the progeny 
of these edited stem cells would produce healthy red blood cells. 
The ability to edit cells extracted from patients with sickle cell dis- 
ease makes this disease—and other blood disorders—one of the more- 
tractable pathologies that could be treated by genome editing in the 
near future. Most genetic diseases, however, will require genome edit- 
ing of cells in the body (in situ) to correct a genetic defect associated 
with a disease. Muscular dystrophy exemplifies this type of disorder, 
because it involves the weakening and disruption of skeletal muscles 
over time”"*, The most common type, DMD, affects 1 in 5,000 males 
at birth, who inherit mutations in the gene that encodes dystrophin 
(DMD), a scaffolding protein that maintains the integrity of striated 
muscles (Fig. 1). Over time, these patients lose the ability to walk and 
eventually succumb to respiratory and heart failure, typically dying 
by the third decade of life. In contrast to therapies that delay disease 
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progression, genome editing offers the possibility of permanent resto- 
ration of the missing dystrophin protein. Although more than 3,000 dif- 
ferent mutations can cause DMD, most occur at hotspots within DMD. 
Notably, restoration of asmall percentage (around 15%) of the normal 
expression levels of dystrophin can provide a clinical benefit”. 

Totreat or cure monogenetic disorders suchas sickle cell disease and 
DMD, it will be important to match the underlying genetic defect with 
the best genome-editing approach. In each case, this involves multiple 
considerations, including the type of editing needed, the mode of cell 
or tissue delivery required and the extent of gene knockout or correc- 
tion that will provide therapeutic value. 

The next section describes current genome-editing technologies 
that offer the potential of curative human genome editing. 


Genome-editing strategies 


Engineered DNA-cleaving enzymes, including zinc-finger nucleases 
(ZFNs) and transcription activator-like effector nucleases (TALENs), 
have demonstrated the potential of therapeutic genome editing. These 
early technologies enabled the inactivation of the gene encoding the 
HIV co-receptor CCR5 in somatic cells”, mitigation of the HBB gene 
mutation in haematopoietic stem cells’”"* and engineering of immune 
cells for the treatment of childhood cancer”. To realize this potential, 
the development of CRISPR-Cas9 for genome editing offers a sim- 
pler technology that has been adopted widely owing to the ease of 
programming of its DNA-binding and modifying capabilities. Cas9 is 
a protein that assembles with a guide RNA— either as separate crRNA 
and tracrRNA components or a chimeric single-guide RNA (SgRNA)— 
to create a molecular entity that is capable of binding and cutting 
DNA’. Notably, DNA binding occurs at a 20-base-pair DNA sequence 
that is complementary to a 20-nucleotide sequence in the guide RNA 
and that can be readily altered by the researcher!” (Fig. 2). The DNA- 
recognition site must be adjacent to a short motif (the protospacer 
adjacent motif or PAM) that acts as a switch, triggering Cas9 to makea 
double-stranded DNA break within the target sequence’”’. In cells of all 
multicellular organisms, including humans, such double-stranded DNA 
breaks induce DNA repair by endogenous cellular pathways that can 
introduce alterations to the DNA sequence, including small sequence 
changes or genetic insertions”. Although CRISPR-Cas9-induced 
genome editing is effective in almost all cell types, controlling the exact 
editing outcome remains a challenge in the field, as discussed below. 

Although the Cas9 of Streptococcus pyogenes (SpCas9) is the enzyme 
that is most commonly used for genome editing and genetic manipu- 
lation using CRISPR-Cas, a growing collection of natural and engi- 
neered Cas9 homologues and other CRISPR-Cas RNA-guided enzymes 
is expanding the genome-manipulation toolbox’. It is the intrinsic 
programmability that is present in this diversity of enzymes that under- 
scores the utility of CRISPR-Cas technology for genome editing and 
other applications including gene regulation and diagnostics (Fig. 2). 

For safe and effective clinical use ex vivo and in vivo, genome editing 
needs to be accurate, efficient and deliverable to the desired cells or tis- 
sues. CRISPR-Cas9-generated DNA cleavage induces genome editing 
during double-stranded DNA break repair by non-homologous end join- 
ing and/or homology-directed repair (Fig. 2). Homology-directed repair, 
which requires the presence of aDNA template, is—in most cases—used 
by the cell less frequently than non-homologous end joining. Further- 
more, bothtypes of repair can happen inthe same cell, creating different 
alleles ofan edited gene. Two concurrent double-stranded DNA breaks 
can induce chromosomal translocations. For these reasons, an active 
area of CRISPR-Cas technology development involves controlling DNA 
repair outcomes to ensure that the desired genetic change is introduced. 

Alternatives to DNA-cleavage-induced editing include using 
CRISPR-Cas9 to directly alter the chemical sequence (base edit- 
ing)*>”°, to generate RNA templates for gene alteration (prime edit- 
ing)?”?5 and for transcriptional control (CRISPR interference and 
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Fig.2| The genome editing toolbox. a—c, Most well-validated CRISPR-based 
tools perform one of three functions: genome editing (a), base editing (b) or 
gene regulation (c). These systems rely on RNA-guided Cas9 or Cas12a to target 
specific genomic sites. These techniques edit the target site by direct cleavage 
of one or both nuclease active sites, triggering cellular DNA repair by non- 
homologous end joining or homology-directed repair, and/or by relying on 
fused effector proteins. a, CRISPR-Cas9 generates a double-stranded break 
(DSB) at the target site to simulate endogenous DNA repair. These double- 
stranded breaks are resolved by the endogenous cellular repair machinery, 
resulting in one of two main outcomes at the cut site: an insertion or deletion, 
or the insertion of or replacement with donor DNA that is delivered at the same 
time. b, A fused domain replaces a singe base through deamination and DNA 
replication or repair. This single base change is propagated to the 
complementary strand of DNA. Changes include C to U (uracil), whichis 
swapped toaT during replication or repair, and A to! (inosine), whichis treated 
as aG. bp, base pair; KO, knockout. c, CRISPR-mediated gene repression or 
interference (CRISPRi) sterically blocks the RNA polymerase and induces 
heterochromatinization, leading to direct epigenetic modifications suchas 
DNA methylations or RNA targeting by modifying individual bases or RNA 
cleavage. CRISPR-mediated gene activation (CRISPRa) recruits the 
transcription machinery to increase expression of the target region and leads 
to direct epigenetic modifications suchas histone acetylation. 


CRISPR activation)” (Fig. 3). In addition, it may be possible to control 
gene outputs through Cas9-mediated epigenetic modification” ”. 
Although these methods have been used in cultured cells, they are not 
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yet ready for clinical use until matters of specificity*?** 


are addressed. 

Two strategies to mitigate or cure sickle cell disease take advantage 
of demonstrated strategies for site-specific genome editing (Figs. 1, 2). 
The first involves the restoration of the wild-type HBB gene sequence 
by homology-directed repair®. The second approach is to activate 
expression of y-globin, the fetal form of haemoglobin that is typically 
silenced in adult cells, by disrupting y-globin repressors®* “ or their 


and delivery 
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binding sites in the promoter of the y-globin (HBG1/HBG2) genes 
These genome-editing strategies require the collection of a patient’s 
haematopoietic stem and progenitor cells, either to correct the muta- 
tion in HBB or to restart expression of y-globin, and the subsequent 
reintroduction of the edited cells into the bone marrow. Major progress 
inthe delivery“ and handling of haematopoietic stem and progenitor 
cells has resulted in impressive efficiencies of mutation correction or 
mitigation’®*> ” that are expected to be curative. 

Such an approach, although it requires a bone-marrow transplanta- 
tion, would remove the need for acompatible bone-marrow donor and 
thus provide a path for treating and potentially curing many more peo- 
ple than can currently be treated. As discussed below, improvements 
in in vivo delivery technology may one day enable treatment without 
requiring bone-marrow transplantation, which would reduce both 
expense and patient hardship. 

Whereas in vivo editing may resolve some of the issues with ex vivo 
sickle cell therapies, studies in DMD illustrate that other challenges arise 
when attempting in situ gene correction. Three reports** *° have high- 
lighted both the tremendous potential of genome editing and the con- 
siderable challenges that remain before genome editing can be used to 
treat or cure muscular dystrophy in humans. Inthe first study, a mouse 
model of DMD was created using CRISPR-Cas9 to generate acommon 
deletion (AEx50) in the Dmd gene that also occurs in patients with 
DMD*s. The severe muscle dysfunction in the AEx50 mice was corrected 
by systemic delivery of an adeno-associated virus (AAV) that encoded 
the CRISPR-Cas9 genome-editing components, restoring up to 90% 
of dystrophin protein expression throughout the skeletal muscles and 
hearts of AEx50 mice. The second study used CRISPR-Cas9-mediated 
genome editing to remove a mutation in exon 23 in the mdx mouse 
model of DMD, providing partial recovery of functional dystrophin 
protein in skeletal myofibres and cardiac muscle*”. In the third study, 
dogs with the AEx50 mutation, which corresponds to a mutational 
‘hotspot’ inthe human DMD gene, were treated using CRISPR-Cas9°°. 
After virus-mediated systemic delivery in skeletal muscle, dystrophin 
levels were restored to 3-90% of normal, and the appearance of the 
muscle tissue in treated dogs was improved. Although promising, these 
reports, as well as early-stage data from patients treated with in vivo 
gene editing using ZFNs, highlight the gap between animal studies 
and applications in humans” *’ and underscore the need for improved 
methods for in situ delivery, as discussed in the next section. Anearly- 
stage clinical trial in which in vivo CRISPR-Cas9 delivery to the eye is 
used to treat congenital blindness*™ anda close-to-the-clinic program 
for liver gene editing® will soon provide key first-in-human data to 
inform the direction of that effort. 


Towards tissue-specific delivery 


For any of these genome-editing methods to be useful clinically, the 
CRISPR-Cas enzymes, associated guide RNAs and any DNA repair tem- 
plates must make their way into the cells that are in need of genetic repair. 
To producea functional genome-editing complex, Cas9 andsgRNA can 
be introduced to cells in target organs in formats that include DNA, mRNA 
and sgRNA, or protein and sgRNA. All three formats are currently—or will 
soon be—used in the clinic, using viral vectors, nanoparticles and elec- 
troporation of protein-RNA complexes, and each has distinct benefits 
and limitations (Table 1). The currently favoured form of ex vivo delivery 
to primary cells is electroporation of Cas9 as a preformed protein-RNA 
(ribonucleoprotein (RNP)) complex****. In vivo delivery, which is much 
more challenging, is currently conducted using viral vectors (typically 
AAVs) or lipid nanoparticles bearing Cas9 mRNA and an sgRNA. The 
difficulty of ensuring efficient, targeted delivery into desired cells in 
the body currently limits the clinical opportunities of in vivo genome 
editing, although this is an area of increasing research and development. 

Viral delivery vehicles, including lentiviruses, adenoviruses and AAVs, 
offer advantages of efficiency and tissue selectivity (Table 1). AAVs are 
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attractive because of the reduced risk of genomic integration, inherent 
tissue tropism and clinically manageable immunogenicity. In addition, 
long-term expression of trans-genes that encode Cas9 and sgRNA from 
the episomal viral genome could help to boost genome-editing effi- 
ciency in patients, such as individuals with DMD as discussed below”. 
Notably, the FDA has approved the use of AAVs for gene-replacement 
therapy in patients with spinal muscular atrophy and congenital blind- 
ness, and clinical trials are in progress*®. 

There are, however, considerable challenges to using AAVs for 
the therapeutic delivery of CRISPR-Cas components. First, the AAV 
genome can only encode around 4.7 kilobases (kb) of genetic cargo, less 
than other viral vectors and not much larger than the 4.2-kb length of 
the gene that encodes S. pyogenes Cas9. As aresult, for applications that 
necessitate the insertion of a corrective gene, asecond AAV vector that 
encodes the sgRNA and a template sequence for homology-directed 
DNA repair must be used, reducing efficiency owing to the need for 
cells to acquire both AAV vectors at once”. Smaller genome-editing 
proteins, suchas the Cas9 of Staphylococcus aureus or Campylobacter 

jejuniand other newly identified CRISPR-Cas enzymes, may circumvent 
this issue’* ©. Second, long-term expression of genome-editing mol- 
ecules may expose patients to undesired off-target editing or immune 
reactions®”, Third, the production of AAVs at scale and the use of 
good manufacturing process methods at affordable cost for clinical 
use remain formidable challenges”. 

Nanoparticles offer an alternative to virus-based delivery of Cas9 
and sgRNAs and are suitable for delivering genome-editing compo- 
nents in the form of DNA, mRNA or RNPs (Table 1). For example, the 
delivery of lipid-mediated nanoparticles has been used to transport 
CRISPR-Cas components in the form of either mRNA and sgRNA or 
preassembled RNPs into tissues” “*. When combined with a highly 
anionic sgRNA, the cationic Cas9 protein forms a stable RNP complex 
that has anionic properties suitable for encapsulation by cationic lipid 
nanoparticles, potentially enabling delivery into cells through endocy- 
tosis and macropinocytosis. Cationic lipid-based delivery is a relatively 
easy, low-cost process for delivering CRISPR components into cells”. 
This approach has been used for one-shot delivery of Cas9 RNPs into 
mice to achieve therapeutically useful levels of genome editing in the 
liver. Disadvantages of this approach include marked toxicity of the 
lipid-mediated nanoparticles” and the potentially undesired selectivity 
of cell-type-specific uptake of the particles. 

Inorganic nanoparticles are another type of delivery vehicle with 
advantages that include tunable size and surface properties. Gold 
nanoparticles, in particular, are attractive materials for molecular 
delivery because of the intrinsic affinity of gold for sulfur, enabling 
functionalized molecules to be coupled to the gold particle surface. 
Gold nanoparticles were used originally for nucleic acid delivery by 
conjugating to thiol-linked DNA or RNA”. Cas9 protein-sgRNA com- 
plexes can be incorporated by assembly with DNA-linked particles”. 
Such assemblies, complexed with polymers capable of disrupting 
endosomes and including DNA templates for homology-directed 
repair, were found to promote correction of Dmd gene mutations in 
mice’”’. Ongoing research continues to advance nanoparticle delivery 
technology, such as for endothelial cells that could enable access to 
the lungs and other organs®. 

Strategies for non-viral cellular delivery of CRISPR-Cas components 
include electroporation, which involves pulsing cells with high-voltage 
currents that create transient nanometre-sized pores in the cell mem- 
brane. This process allows negatively charged DNA or mRNA molecules 
or CRISPR-Cas RNPs to enter the cells. Although this method is a pri- 
mary method of Cas9-sgRNA delivery to cells ex vivo, electroporation 
has also been used successfully for Cas9 delivery to animal zygotes*)*, 
and to introduce CRISPR-Cas constructs directly into the skeletal mus- 
clein mice, resulting in restoration of Dmd gene expression®. Electropo- 
ration will likely be of limited utility for most in vivo genome-editing 
applications because of its impracticality. 


Table 1| Methods for delivering genome-editing tools 


Property Nanoparticles Viruses 


RNPs 


Features and 
applications 


Cationic lipid polymers can be used 
to encapsulate molecular cargo, 
facilitating cellular entry. 


vehicle for gene therapy. 


AAVs are the most commonly used clinical delivery 


Purified protein and guide RNA can be 
electroporated into stem cells extracted from 
patients to treat blood disorders such as sickle cell 


disease. 
Size 50-500 nm 20 nm 12nm 
Payload mRNA, DNA, RNP (from most to least DNA Preformed enzyme complexes 
commonly used) 
Advantages -Inexpensive and relatively easy to _—- - Broad tissue targeting possibilities -No genomic integration 


produce 
-No genomic integration 
-Low immunogenicity 


-Efficient 


-Clinically established method 


-No long-term expression and fewer off-target 
effects 


Disadvantages -Limited capacity for tissue -Limited cargo size 


-Will not enter cells without engineering or additional 


targeting -Undesired integration risk reagents 
- Sustained expression can lead to off-target effects -Potential immunogenicity in vivo 
-Immunogenicity -Unprotected RNPs are at risk of degradation 
-High cost and manufacturing challenges 
Targets Liver Liver, eyes, brain, lungs and muscle Oocytes, stem cells and T cells 


The three main delivery strategies that could be used for clinical genome-editing applications are nanoparticles, viruses and purified RNPs. The approaches vary in important ways, which 


generally limit their suitability for editing to specific cell or tissue types. 


Another non-viral delivery method is the direct application of pre- 
assembled CRISPR-Cas RNPs, with or without chemical modifications 
to assist cell penetration of cultured cells or organs. This delivery mode 
can reduce possible off-target mutations relative to delivering Cas9- 
encoding DNA or mRNA due to the short half-life of RNPs”** ®°, New 
strategies for the direct delivery of CRISPR-Cas9 RNP complexes 
continue to emerge, including those using molecular engineering 
to enhance the targeting of specific cell types®” and to increase the 
efficiency of cell penetration®. 

Delivery remains perhaps the biggest bottleneck to somatic-cell 
genome editing, a reality that has motivated increasing effort across 
different disciplines. Emerging strategies that may have substantial 
impact on the clinical use of genome editing include advances in nano- 
particle- and cell-based delivery methods® as well as approaches that 
involve red blood cells”’ and nanowires”. 


Accuracy, precision and safety of genome editing 


The clinical utility of genome editing depends fundamentally on accu- 
racy and precision. Accuracy refers to the ratio of on- versus off-target 
genetic changes, whereas precision relates to the fraction of on-target 
edits that produce the desired genetic outcome. Inaccurate (off-target) 
genome editing occurs when CRISPR-induced DNA cleavage and repair 
happens at genome locations not intended for modification, typically 
sites that are close in sequence to the intended editing site”’. Impre- 
cise genome editing results from different modes of DNA repair after 
on-target DNA cleavage, such as a mixture of non-homologous end- 
joining and homology-directed recombination events that produce 
different sequences at the desired editing location in different cells. 
In addition, large deletions and complex genomic rearrangements 
have been observed after genome editing in mouse embryonic cells, 
haematopoietic progenitor cells and human immortalized epithelial 
cells”? °°. Although these events occur at low frequency, they could be 
important ina clinical setting if rare translocations led to cancer” *°, 
Careful testing will be required to detect and monitor both the accuracy 
and precision of genome editing in clinical settings and ultimately to 
reduce or eliminate undesired events by controlling target site recogni- 
tion and DNA repair outcomes. The National Institute of Standards and 
Technology manages ascientific consortium that aims to measure and 
standardize such outcomes as genome-editing technology advances”. 

The risks intrinsic to DNA-cleavage-induced genome editing 
have spurred the development of CRISPR-Cas9-mediated genome 


regulation or editing methods that do not involve double-stranded DNA 
cutting. CRISPR interference and CRISPR activation both use catalyti- 
cally deactivated forms of Cas9 (dCas9) that are fused to transcriptional 
repressors or activators?” Similarly, CRISPR-Cas9-mediated epige- 
netic modification to control gene expression is also under develop- 
ment’. An alternative approach is to use CRISPR-Cas9 coupled to 
DNA-editing enzymes that catalyse targeted A-to-G or C-to-T genomic 
sequence changes without inducing a break in the DNA, potentially 
reversing pathogenic single-nucleotide changes or disabling genes 
through the introduction of a stop codon*”°. CRISPR-Cas9 can also be 
linked to reverse transcriptase and used for targeted template-directed 
sequence alterations'™. All of these strategies—although elegant in 
principle—involve large chimeric proteins that pose additional chal- 
lenges of delivery into primary cells or animals. The specificity of action, 
both at the target site and genome-wide, remains an area of active 
investigation. Issues of delivery, potency and specificity of CRISPR 
interference, CRISPR activation and CRISPR-mediated base editing 
and prime editing will need to be thoroughly addressed before they 
are ready for clinical use. 

Other factors that affect clinical applications of genome editing 
include the immunogenicity of bacterially derived editing proteins, 
the potential for pre-existing antibodies against CRISPR components 
to cause inflammation and the unknown long-term safety and stability 
of genome-editing outcomes. Immunogenicity of CRISPR-Cas proteins 
could be managed by high-efficiency one-time editing treatments 
and by using different editing enzymes. Pre-existing Cas9 antibodies 
and reactive T cells have been detected in humans exposed to path- 
ogenic bacteria that have CRISPR systems, although it is unknown 
whether these are present at sufficiently high concentrations to trig- 
ger animmune response to the genome-editing enzymes*”™. Notably, 
genome-editing therapies that involve ex vivo editing, such as for sickle 
cell disease, are not as affected by either immunogenicity or pre-exist- 
ing CRISPR-Cas antibodies, as the natural decay of residual Cas9 pro- 
tein inthe ex vivo edited cells minimizes Cas9 exposure. The potential 
for inadvertent selection of genome-edited cells with undesired genetic 
changes came to light with the observation that selection for inactiva- 
tion of the p53 pathway, which is associated with rapid cell growth 
and cancer, can occur during laboratory experiments on cells that 
are not used clinically’**!°°. Subsequent experiments showed that p53 
inactivation can be controlled or avoided through protocol optimiza- 
tion*””°*, As for the long-term safety and efficacy of genome-edited cells 
in vivo, much remains to be determined. However, the recent report 
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of a single HIV-positive patient who received CRISPR-Cas9-edited 
haematopoietic progenitor cells showed that although the number of 
edited cells was too lowto mitigate HIV infection, no adverse outcome 
was detected more than 19 months after transplantation of the edited 
cells'””. Together, these findings suggest that there are, at present, no 
known insurmountable hurdles to the eventual development of safe 
and effective clinical applications of genome editing in humans. 


Therapeutic genome editing 


The clinical potential of genome editing exemplified by applications 
in sickle cell disease, muscular dystrophy and other monogenetic dis- 
orders could be stymied by extreme pricing of such next-generation 
therapeutics. Although CRISPR technology itself is a democratizing 
tool for scientists, extension of its broad utility in biomedicine requires 
addressing the costs of development, personalization for individual 
patients and the intrinsic difference between a chronic disease treat- 
ment versus a one-and-done cure’. 

Current clinical trials using the CRISPR platform aim to improve chi- 
meric antigen receptor (CAR) T cell effectiveness, treat sickle cell disease 
and other inherited blood disorders, and stop or reverse eye disease’. In 
addition, clinical trials to use genome editing for degenerative diseases 
including for patients with muscular dystrophy are on the horizon. For 
sickle cell disease, the uniform nature of the underlying genetic defect 
lends itself to correction by astandardized CRISPR modality that could 
be used in many if not most patients. This simplifies clinical testing but 
also makes the need to address patient cost and access more acute, given 
that the approximately 100,000 US patients and millions of individuals 
in African and Asian countries will be candidates for treatment. 

For muscular dystrophy, the genetic diversity among patients lends 
itself to personalization, which is an inherent strength of the CRISPR 
genome-editing platform; however, it also complicates clinical testing 
strategies. In addition, progressive diseases such as muscular dystro- 
phy require early treatment to be most effective, raising questions 
about coupling diagnosis and treatment. Beyond these examples, many 
rare genetic disorders will be treatable—in principle—ifa streamlined 
strategy for CRISPR therapeutic development can be implemented’. 
With its potential to address unmet medical needs, the clinical use of 
genome editing will ideally spur changes to regulatory guidelines and 
cost reimbursement structures that will benefit the field more broadly 
as these therapies continue to advance. 

Notably, all of the genome-editing therapeutics under develop- 
ment aim to treat patients through somatic cell modification. These 
treatments are designed to affect only the individual who receives 
the treatment, reflecting the traditional approach to disease mitiga- 
tion. However, genome editing offers the potential to correct disease- 
causing mutations in the germline, which would introduce genetic 
changes that would be passed on to future generations. The scientific 
and societal challenges associated with human germline editing are 
distinct from somatic cell editing and are discussed in the next section. 


Heritable genome editing 

Human germline genome editing can introduce heritable genetic 
changes in eggs, sperm or embryos. Germline genome editing is 
already in widespread use in animals and plants, and has been used 
in human embryos for research purposes. A report of alleged use of 
human embryo editing that resulted in the birth of twin baby girls 
with edited genomes has focused global attention on an application 
of genome editing that must be rigorously regulated, as underscored 
by international scientific organizations. 

Human germline editing differs from somatic cell editing because 
it results in genetic changes that are heritable if the edited cells are 
used to initiate a pregnancy (Fig. 4). Germline editing has been used 
for years in animals, including mice, rats, monkeys and many others, 
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Fig. 4 | Editing the human germline. Genomic changes made during or after 
embryogenesis may be found insome (mosaic) or all of the cells of the child, 
including the germline. In contrast to somatic editing (Fig. 1), germline-edited 
humans can pass these edits down to subsequent generations. In the first 
human germline-editing experiment in embryos carried to term, the stated 
goal was to confer HIV resistance, making this example relevant to the real 
world and highlighting the potential problematic nature of this technique. 


and experiments show that it can also be done in both nonviable and 
viable human embryos’”’ "”, Although none of the published work 
involves implantation of the edited embryos to initiate a pregnancy, 
such clinical work was reported at a conference on human genome 
editing in November 2018, leading to international condemnation in 
light of clear violations of ethical and scientific guidelines. 

This work and the accompanying discussion around human germline 
editing have raised important questions that affect the future direction of 
the science as well as the societal and ethical issues that accompany any 
such applications. First, research using CRISPR-Cas9 in human embryos 
has challenged our current understanding of DNA repair mechanisms 
and the developmental pathways that occur in these cells. A report of 
inaccurate CRISPR-Cas9-based genome editing in non-viable human 
embryos” was not substantiated by later publications, but the mecha- 
nism by which double-stranded DNA breaks are repaired in early human 
embryos remains under debate. Some results were interpreted to indi- 
cate repair of a CRISPR-Cas9-targeted gene allele by homology-directed 
repair with the other allele of the cell as the donor template’. Other 
scientists argued that such repair would be impossible given the appar- 
ent physical separation of sister chromatids early inembryogenesis, and 
suggested that the data could also be consistent with large deletions 
in the embryo genomes”. Resolving this fundamental question will 
require further experiments. Human embryo editing has also begun to 
reveal differences in the genetics of early development between mice 
and humans”, underscoring the potential value of research that will be 
enabled by precision genome modification. 

Asecond question raised by applications of genome editing inhuman 
embryos concerns the appropriate professional and societal response. 
Organizations including the National Academy of Sciences, the National 
Academy of Medicine, the Royal Society and their equivalents in other 
countries have sponsored meetings and reports, as have professional 
societies including the American Society of Human Genetics’», UK 
Association of Genetic Nurses and Counsellors, Canadian Associa- 
tion of Genetic Counsellors, International Genetic Epidemiology 
Society, US National Society of Genetic Counselors, American Soci- 
ety for Reproductive Medicine, Asia Pacific Society of Human Genet- 
ics, British Society for Genetic Medicine, Human Genetics Society of 
Australasia, Professional Society of Genetic Counselors in Asia, and 
Southern African Society for Human Genetics. These groups agree 
on anumber of key points. First, at this time, given the nature and 
number of unanswered scientific, ethical and policy questions, it is 
inappropriate to perform germline genome editing that culminates in 
human pregnancy. Second, in vitro germline genome editing on human 


embryos and gametes should be allowed, with appropriate oversight 
and consent from donors, to facilitate research on the possible future 
clinical applications of gene editing, and there should be no prohibition 
on public funding of this research. Third, future clinical applications 
of human germline genome editing should not proceed unless, at a 
minimum, there is (a) acompelling medical rationale, (b) an evidence 
base that supports its clinical use, (c) an ethical justification and (d) a 
transparent public process to solicit and incorporate stakeholder input. 

The third question raised by applications of CRISPR-Cas9 in human 
embryos is howto move the technology forward while ensuring respon- 
sible use. At the time of writing, international commissions convened 
by the World Health Organization (WHO) and by the US National Acad- 
emy of Sciences and National Academy of Medicine, together with the 
Royal Society, are drafting detailed requirements for any potential 
future clinical use. Medical needs must be defined so that risks versus 
possible benefits can be evaluated. Most importantly, procedures by 
which patients could be informed about the technology, its risks and 
a process for monitoring health outcomes must be determined. 


Outlook 


Therapeutic genome editing will be realized, at least for some diseases, 
over the next 5-10 years. This profound opportunity to change health- 
care for many people requires scientists, clinicians and bioethicists 
to work with healthcare economists and regulators to ensure safe, 
effective and affordable outcomes. The potential impact on patients 
is too important to wait. 
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For the past 150 years, the prevailing view of the local interstellar medium has been 


based ona peculiarity known as the Gould Belt’ “, an expanding ring of young stars, 
gas and dust, tilted about 20 degrees to the Galactic plane. However, the physical 
relationship between local gas clouds has remained unknown because the accuracy in 
distance measurements to such clouds is of the same order as, or larger than, their 
sizes> ’. With the advent of large photometric surveys® and the astrometric survey”, 
this situation has changed”®. Here we reveal the three-dimensional structure of all 
local cloud complexes. We find a narrow and coherent 2.7-kiloparsec arrangement of 
dense gas in the solar neighbourhood that contains many of the clouds thought to be 
associated with the Gould Belt. This finding is inconsistent with the notion that these 
clouds are part of aring, bringing the Gould Belt model into question. The structure 
comprises the majority of nearby star-forming regions, has an aspect ratio of about 
1:20 and contains about three million solar masses of gas. Remarkably, this structure 
appears to be undulating, and its three-dimensional shape is well described bya 
damped sinusoidal wave on the plane of the Milky Way with an average period of 
about 2 kiloparsecs and a maximum amplitude of about 160 parsecs. 


To reveal the physical connections between clouds in the local inter- 
stellar medium (ISM), we determined the three-dimensional (3D) dis- 
tribution ofall local cloud complexes" by deriving accurate distances 
to about 380 lines of sight. The lines of sight were chosen to include 
not only all known local clouds!” but also potential bridges between 
them, as traced by lower-column-density gas. Figure 1 presents the 
distribution of lines of sight studied towards the Galactic anti-centre 
and illustrates our overall approach. Each line of sight covers an areain 
the sky of about 450 arcmin’ and includes both foreground and back- 
ground stars for a particular direction towards a cloud. The distances 
and the colours of these stars are used to compute a distance to the 
cloud (see Methods). 

In the interactive figure in Supplementary Information we present 
the distribution of cloud distances to all of the studied lines of sight 
in a Cartesian XYZ frame where X increases towards the Galactic cen- 
tre, Yincreases along the direction of rotation of the Galaxy and Z 
increases upwards out of the Galactic plane. In the X-Y projection (a 
top-down view of the Galactic disk), it is clear that cloud complexes 
are notrandomly distributed, but instead tend to form elongated and 
relatively linear arrangements. Surprisingly, we find that one of the 
nearest structures, at about 300 pc from the Sun at its closest point, 
is exceptionally straight and narrow in the X-Y plane. This straight 
structure: (1) undulates systematically in the Z axis for about 2.7 kpc 
onthe X-Y plane, (2) is co-planar in essentially its entire extent and (3) 
displays radial velocities” indicating that the structure is notarandom 


alignment of molecular cloud complexes but a kinematically coherent 
structure. We find that this structure is well modelled as a damped 
sinusoidal wave. The red points in Fig. 2 were selected by the fitting 
procedure, by explicitly modelling inliers and outliers. We tested the 
validity of the model by modelling the ‘tenuous connections’ separately 
and confirming that they meet the same inlier criteria that were first 
applied to the major clouds. For more details on the statistical model- 
ling, see Methods. 

Apart from the continuous undulating 3D distribution, there is 
also very limited kinematic evidence that the structure is physically 
oscillating around the mid-plane of the Galaxy, as any sinusoidal mass 
distribution centred on the Galactic plane should. The Galactic space 
velocities (U, V, W) in the local-standard-of-rest frame for a sample 
of young stellar objects associated with the Orion A cloud near the 
‘trough’ of this structure are (-10.2, -1.2, -0.1) kms (J. Grossschedl, 
private communication), implying that Orion A has now reached its 
maximum distance from the Galactic plane before falling back into 
the plane. These observations also indicate that Orion, and probably 
the large structure described here, is moving tangentially with about 
the same speed as the local Galactic disk. 

This spatially and kinematically coherent structure has an amplitude 
of roughly 160 pc at its maximum and a period of roughly 2 kpc. We 
estimate the mass of the structure to be at least 3 x 10°M,, (M,, solar 
mass) by integrating the Planck opacity map” for the different cloud 
complexes inthe structure at their estimated distances. The procedures 
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Fig. 1| Sky map of targeted star-forming regions towards the anti-centre of 
the Milky Way. The filled circles represent the studied lines of sight used to 
determine accurate distances to known nearby star-forming complexes 

(the sizes of the region labels are roughly proportional to their distance). The 


used to compute the mass and model the 3D shape of the structure 
are described in Methods. We name the structure the Radcliffe Wave 
in honour of both the early-20th-century female astronomers from 
Radcliffe College and the interdisciplinary spirit of the current Radcliffe 
Institute, which contributed to this discovery. The structure can also be 
seen at lower resolution in recent all-sky 3D dust maps” ” (see Fig. 2). 
Asecond linear structure, the ‘split”®, is about 1 kpc long and seems to 
contain the Sco-Cen, Aquila and Serpens clouds, as well as a previously 


opencircles represent lines of sight towards lower-column-density envelopes 
between complexes. The background greyscale map shows the column density 
distribution derived from Planck data". 


unidentified complex. The functional form of the split is different, 
however, in that it is largely confined to the Galactic plane over much 
of its length and does not seem to be undulating. 

The interactive figure in Supplementary Information, which displays 
the 3D location of the Gould Belt’, illustrates that with the improved 
distances, this structure is a poor fit to the data, which comprise only 
clouds from Sco-Cen and Orion—the traditional anchors of the Gould 
Belt. This fact alone challenges the existence of a belt, as two points can 
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Fig. 2|3D distribution of local clouds. The position of the Sunis marked witho. 
The size of the symbols is proportional to the column density. The red points were 
selected bya fitting algorithm, as described in Methods. These describe a spatially 
and kinematically coherent structure that we term the Radcliffe Wave (possible 
models are shown by the grey lines inthe bottom-right panel). The greyscale 

map inthe left panel show anintegrated dust map” (-300 pc<Z<300 pc), 

which indicates that our sample of cloud distances is essentially complete. 

To highlight the undulation and co-planarity of the structure, the right panels 
show projections in which the X-Y frame has been rotated anticlockwise by 33° 
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(top, Xprime-Z) and clockwise by 120° (bottom, Y,,ime~Z) for an observer facing 
the Galactic anti-centre. The lostatistical uncertainties onthe distance 
(usually 1-2%) are represented by line segments that are usually smaller than 
the symbols. There is an additional systematic uncertainty onthe distance, 
which is estimated” to be about 5%. For an interactive version of this figure, 
including additional layers not shown here (for example, a model of the Gould 
Belt and log-spiral arm fits), see Supplementary Information and https://faun. 
rce.fas.harvard.edu/czucker/Paper_Figures/radwave.html. 


always define a ring. Because four out of five of the Gould Belt clouds 
(Orion, Perseus, Taurus, Cepheus) are part of the much larger Rad- 
cliffe Wave, whereas one of the five (Ophiuchus) is part of the split, we 
propose that the Gould Belt is a projection effect of two linear cloud 
complexes against the sky. Our results provide an alternate explanation 
for the 20° inclination of the Gould Belt: it is simply the orientation of 
the Radcliffe Wave from trough (Orion) to crest (Cepheus). With these 
considerations, the X-Y distribution of local B-stars in these regions 
from the 30-year-old Hipparcos satellite” resembles the two elongated 
linear structures in Fig. 2 more closely than a ring, bolstering previous 
suspicions that the Gould Belt is a projection effect”°. 

In Supplementary Information, one can access an interactive dis- 
play of the 3D location of the Local Arm of the Milky Way as traced by 
masers” and investigate the relation between the Radcliffe Wave and 
the Local Arm. The Radcliffe Wave (red points) is about 20% of the width 
and 40% of the length of the Local Arm” and makes up for an important 
fraction of the Local Arm’s mass and number of cloud complexes. On the 
other hand, the Local Arm is much more dispersed and includes local 
complexes that are not part of the Radcliffe Wave (for example, Mon 
OBI, California, Cepheus Far and Ophiuchus). Whereas there is excellent 
agreement between our distance measurements and maser-defined 
distances”, the log-spiral fit of the maser data crosses the Radcliffe 
Wave at an angle of about 25°. The mismatch between the Radcliffe 
Wave and the log-spiral fit suggests that the Local Arm is more struc- 
tured and complex than previously thought, but is consistent with arms 
being composed of quasi-linear structures on kiloparsec scales”*. 

The origin of the Radcliffe Wave is unclear. The structure is too large 
(and too straight) to have formed by the feedback of a previous genera- 
tion of massive stars. More probably, this narrow structure is the out- 
come ofalarge-scale Galactic process of gas accumulation, either from 
ashock frontina spiral arm” or from gravitational settling and cooling 
onthe plane of the Milky Way (Kim, W.-T. & Ostriker, E. C., manuscript 
in preparation). Linear kiloparsec-sized structures similar to the one 
presented here have been seen in nearby galaxies” and in numerical 
simulations” of spiral-arm formation. 

The undulation of the Radcliffe Wave is even harder to explain. The 
accretion of atidally stretched gas cloud settling into the Galactic disk 
could in principle mimic the shape and the damped undulation of the 
structure, but it requires synchronization with the Galactic rotation 
(Orion’s velocity in the local-standard-of-rest frame, V,sp~Okms”), 
whichis plausible but seems unlikely. Analogous kiloparsec-sized waves 
(or corrugations) have been seen in nearby galaxies” with amplitudes 
similar to the undulation seen in Fig. 2 (ref.”%), but their origins often 
call for perturbers. Identifying possible disruption events, their cor- 
responding progenitors and their relationship to the Radcliffe Wave 
is a substantial challenge that should be explored. 

Our findings call for a revision of the architecture of gas inthe solar 
neighbourhood and are-interpretation of phenomena that are gener- 
ally associated with the Gould Belt, such as the Lindblad ring and the 
Cas-Tau/a-Per populations, among many others”. The Radcliffe Wave 
provides a framework for understanding molecular cloud formation 
and evolution. Follow-up work, in particular on the kinematics of this 
structure, will provide insights into the relative roles of gravity, feed- 
back and magnetic fields in star-formation research. 


Online content 


Any methods, additional references, Nature Research reporting sum- 
maries, source data, extended data, supplementary information, 
acknowledgements, peer review information; details of author con- 
tributions and competing interests; and statements of data and code 
availability are available at https://doi.org/10.1038/s41586-019-1874-z. 


1. Herschel, J. F. W. Results of Astronomical Observations Made During the Years 1834, 5, 6, 
7, 8, at the Cape of Good Hope; Being the Completion of a Telescopic Survey of the Whole 
Surface of the Visible Heavens, Commenced in 1825 (Smith, Elder and Company, 1847). 

2. Gould, B. A. On the number and distribution of the bright fixed stars. Am. J. Sci. 38, 

325-333 (1874). 

Bobyley, V. V. The Gould belt. Astrophysics 57, 583-604 (2014). 

4.  PalouS, J. & Ehlerova, S. in Handbook of Supernovae (eds Alsabti, A. W. & Murdin, P.) 
2301-2311 (Springer, 2016). 

5. | Maddalena, R. J., Morris, M., Moscowitz, J. & Thaddeus, P. The large system of molecular 

clouds in Orion and Monoceros. Astrophys. J. 303, 375-391 (1986). 

6. Lombardi, M., Lada, C. J. & Alves, J. Hipparcos distance estimates of the Ophiuchus and 

the Lupus cloud complexes. Astron. Astrophys. 480, 785-792 (2008). 

7. — Schlafly, E. F. et al. A large catalog of accurate distances to molecular clouds from ps1 

photometry. Astrophys. J. 786, 29 (2014). 

8. Chambers, K. C. et al. The Pan-STARRS1 surveys. Preprint at https://arxiv.org/ 


x 


abs/1612.05560 (2016). 

9. Brown, A.G.A., Vallenari, A., Prusti, T. & de Bruijne, J. H. J. Gaia Data Release 2: summary 
of the contents and survey properties. Astron. Astrophys. Suppl. Ser. 616, A1 (2018). 

10. Zucker, C. et al. A large catalog of accurate distances to local molecular clouds: the Gaia 


DR2 edition. Astrophys. J. 879, 125 (2019). 

11. Reipurth, B. (ed.) Handbook of Star Forming Regions, Volume |: The Northern Sky Vol. 4 

(ASP, 2008). 

12. Zucker, C. et al. A compendium of distances to molecular clouds in the star formation 

handbook. Astron. Astrophys. 633 A51 (2020). 

13. Dame, T. M., Hartmann, D. & Thaddeus, P. The Milky Way in molecular clouds: a new 

complete CO survey. Astrophys. J. 547, 792-813 (2001). 

14. Planck Collaboration Planck 2013 results. X1. All-sky model of thermal dust emission. 

Astron. Astrophys. 571, A11 (2014). 

15. Green, G. M. et al. Galactic reddening in 3D from stellar photometry - an improved map. 

Mon. Not. R. Astron. Soc. 478, 651-666 (2018). 

16. Lallement, R. et al. Gaia-2MASS 3D maps of Galactic interstellar dust within 3 kpc. Astron. 

Astrophys. 625, A135 (2019). 

17. Green, G. M., Schlafly, E. F., Zucker, C., Speagle, J. S. & Finkbeiner, D. P. A 3D dust map 

based on gaia, Pan-STARRS 1 and 2MASS. Astrophys. J 887, 93 (2019). 

18. Perrot, C. A. & Grenier, |. A. 3D dynamical evolution of the interstellar gas in the Gould 

belt. Astron. Astrophys. Suppl. Ser. 404, 519-531 (2003). 

19. Elias, F., Cabrera-Cano, J. & Alfaro, E. J. OB stars in the solar neighborhood. I. Analysis of 
their spatial distribution. Astron. J. 131, 2700-2709 (2006). 

20. Bouy,H. & Alves, J. F. Cosmography of OB stars in the solar neighbourhood. Astron. 
Astrophys. Suppl. Ser. 584, A26 (2015). 

21. Reid, M. J., Dame, T. M., Menten, K. M. & Brunthaler, A. A parallax-based distance 
estimator for spiral arm sources. Astrophys. J. 823, 77 (2016). 

22. Reid, M. J. et al. Trigonometric parallaxes of high mass star forming regions: the structure 
and kinematics of the Milky Way. Astrophys. J. 783, 130 (2014). 

23. Honig, Z. N. & Reid, M. J. Characteristics of spiral arms in late-type galaxies. Astrophys. J. 
800, 53 (2015). 

24. D’Onghia, E., Vogelsberger, M. & Hernquist, L. Self-perpetuating spiral arms in disk 
galaxies. Astrophys. J. 766, 34 (2013). 

25. Goodman, A.A. et al. The bones of the Milky Way. Astrophys. J. 797, 53 (2014). 

26. Elmegreen, B. G., Elmegreen, D. M. & Efremov, Y. N. Regularly spaced infrared peaks in the 
dusty spirals of Messier 100. Astrophys. J. 863, 59 (2018). 

27. Edelsohn, D. J. & Elmegreen, B. G. Corrugations in galactic discs generated by 
magellanic-type perturbers. Mon. Not. R. Astron. Soc. 287, 947-954 (1997). 

28. Matthews, L. D. & Uson, J. M. Corrugations in the disk of the edge-on spiral galaxy IC 2233. 
Astrophys. J. 688, 237-244 (2008). 

29. Bally, J. in Handbook of Star Forming Regions Vol. 4 (ed. Reipurth, B.) 459-370 
(Astronomical Society of the Pacific, 2008). 


Publisher's note Springer Nature remains neutral with regard to jurisdictional claims in 
published maps and institutional affiliations. 


© The Author(s), under exclusive licence to Springer Nature Limited 2020 


Nature | Vol578 | 13 February 2020 | 239 


Article 


Methods 


Distances 

Distances were determined for 326 lines of sight using major local 
molecular clouds and 54 ‘bridging’ lines of sight in between molecular 
clouds coincident with the projected structure of the Radcliffe Wave. 
The methodology used to obtain the distances and the full catalogue 
of lines of sight for the major clouds are presented in complementary 
work!°”, Lines of sight for the major clouds were chosen to coincide 
with star-forming regions presented in ref. ", whichis considered to 
be the most comprehensive resource on individual low- and high- 
mass star-forming regions out to 2 kpc. Lines of sight for the tenu- 
ous connections were chosen in two dimensions to coincide with 
structures (for example, diffuse filamentary ‘bridges’; see Fig. 1) 
that appeared to span the known star-forming regions on the plane 
of the sky without a priori knowledge of their distances. These were 
later used to validate the 3D modelling, which did not incorporate 
these distances. 


Mass 

We estimate the mass of the Radcliffe Wave to be about 3 x 10°M, using 
the Planck column density map shown in Fig. 1. To estimate the total 
mass, we first define the extent and depth for each complex in Fig. 1 
using the information on the line-of-sight distances. We then integrate 
the column density map using the average distance to each complex. To 
correct for background contamination, whichis critical for complexes 
closer to the plane, we subtract an average column density per complex 
estimated at the same Galactic latitude. Our resulting mass estimate 
of the Radcliffe Wave is probably an approximate lower limit to the 
true mass of the structure, given that the regions of the wave crossing 
the plane from Perseus to Cepheus and from Cepheus to Cygnus are 
poorly sampled owing to Galactic plane confusion. 


Kinematics 
We apply the open-source Gaussian fitting package PySpecKit”’ over 
local “CO spectral observations” to obtain the observed velocities 
of the star-forming regions shown in Extended Data Fig. 1. For each 
line of sight, we compute a spectrum over the same region that is 
used to compute the dust-based distances. We then fit a single- 
component Gaussian to each spectrum and assign the mean value 
as the velocity. We are not able to derive observed velocities for 
~25% of the sample that either fall outside the boundaries of the 
survey”, have no appreciable emission above the noise thresh- 
old and/or contain spectra that are not well modelled by a single- 
component Gaussian. The spectra that are not well modelled by a 
single-component Gaussian represent about 2% of the lines of sight 
and occur towards the most massive, structured and extinguished 
lines of sight in the sample, suggesting that these spectra could 
contain CO self-absorption features. We have confirmed that these 
more complex spectra do not show evidence of multiple distance 
components. Regardless, because the predicted velocities rely 
only on the estimated cloud distances assuming that they follow 
the ‘universal’ Galactic rotation curve”, not every line of sight in 
Extended Data Fig. 1 has a corresponding observed velocity associ- 
ated with its predicted velocity. 

We compute the background greyscale map in Extended Data Fig. 1 
by collapsing the “CO spectral observations over only the regions that 
are coincident with the cloud lines of sight on the plane of the sky. 


12° 


3D modelling 

We model the centre of the Radcliffe Wave using a quadratic function 
with respect to X, Yand Z specified by three sets of ‘anchor points’, 
(Xo, Vor Zo)» Xp Vp Z) and (x5, y>,Z,). We find that a simpler linear function 
is unable to accurately model the observed curvature in the structure 
and is subsequently disfavoured by the data. 


The undulating behaviour with respect to the centre is described by 
a damped sinusoidal function relative to the X- Y plane with a decaying 
period and amplitude, which we parameterize as 


2 
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where d(t)=|(x, y, 2)(t) ~ (Xo,Yg,Z0)| = [(X-Xo)? + ~ Io)? + Z—Zo)” 
is the Euclidean distance from the start of the wave parameterized by 
t, dnax is the distance at the end of the wave, A is the amplitude, Pis the 
period, @ is the phase, 6 sets the rate at which the amplitude decays 
and ysets the rate at which the period decays. We explored introducing 
an additional parameter to account for rotation around the primary 
axis determined by our quadratic fit, but found that the results were 
entirely consistent with the structure oscillating in the X-Y plane, and 
so excluded this parameter in our final model. 

We assume the distance of each cloud d,,,4 relative to our model 
to be normally distributed with an unknown scatter o that is roughly 
equivalent to the radius of the wave. To account for different posi- 
tions along the wave, we define this distance relative to the closet 
point as 

Aeloud = min( | XioudIioud? Zctoud) (Xwaver Ywave’ Zwave)(0)|) (2) 
Finally, we account for structure ‘off’ the wave by fitting a mixture 
model. We assume that a fraction f of clouds are distributed quasi- 
uniformly ina volume of roughly 10’ pc’, so that the remaining 1 — fis 
part of the wave. We treat fentirely as a nuisance parameter because 
it is completely degenerate with the volume of our uniform outlier 
model, although we have specified it so that the uniform component 
will contribute a ‘minority’ of the fit (<40%). 

Assuming that the distances to each of our nclouds have been derived 

independently, and defining 9 = {Xo, Vo, Zo. Xp Vy Z»Xa, Var Zn PA, D, Y, 6, 0, 
fi, the likelihood for a given realization of our 16-parameter 3D model is 


£(9) =F] [Af )Letoua, (8) +F Lune, | (3) 


i=l 


where 


2 
1 1 datoud,i 
Latoua,i69) _ lno2 09 2 32 J 


Lunit f8) = 107” 


(4) 


We infer the posterior probability distribution P(6) of the 3D model 
parameters to be consistent with our cloud distances (excluding all 
bridging features) using Bayes’ theorem: 


P(A) « £(8) (8) (S) 


where 7(6) is our prior distributions over the parameters of interest. 
We set our prior 7(@) to be independent for each parameter, on the 
basis of initial fits. The priors on each parameter are described in 
Extended Data Table 1, where (1, o)is anormal distribution with mean 
pand standard deviation o and C((a, b) is a uniform distribution with 
lower bound aand upper bound b. 

We generate samples from P(8) with the nested sampling code 
dynesty” using a combination of uniform sampling with multi-ellipsoid 
decompositions and 1,000 live points. Asummary of the derived prop- 
erties of the Radcliffe Wave are listed in Extended Data Table 2 along 
with their associated 95% credible intervals. The 20 random samples 
from P(8) are plotted in Fig. 2 to illustrate the uncertainties in our 
model. 


Using our samples, we associate particular sightlines with the Wave 
by computing the mean odds ratio averaged over our posterior 


1- (0. 
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based on our set of samples. We subsequently classify all objects with 
{R;) > 1as being part of the Radcliffe Wave, which is used as the criterion 
for associating sources in Fig. 2. We find that this condition holds true for 
43% of the sources used to determine our initial model. Our overall con- 
clusions do not change if larger, more selective thresholds are chosen. 

As further validation, we subsequently compute <R;) for each of the 
54 bridging lines of sight targeted to follow the projected structure of 
the Radcliffe Wave. We find that all 54 lines of sight satisfy our (R;) >1 
condition, further confirming the continuous nature of the Wave 
between individual clouds. 

In addition to the parameters derived above, we estimate the total 
length of the feature in our dataset by computing the line integral 
along our model from the clouds at the endpoints, finding a length of 
2.7 + 0.2 kpc (95% credible interval). The derived physical properties 
of the feature are listed in Extended Data Table 3. 


Data availability 


The datasets generated and/or analysed during the current study are 
publicly available on the Harvard Dataverse: the distances to the major 
star-forming clouds are available at https://doi.org/10.7910/DVN/O7L7YZ 
and the tenuous connections at https://doi.org/10.7910/DVN/K16GQX. 
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Extended Data Fig. 1|Position—-velocity diagram.a,b, Thebluepointsinaare velocity of the Radcliffe Wave complexes suggests that the structure is nota 


as in Fig. land the orange points inb represent the predicted positions of the random alignment of molecular cloud complexes, but a kinematically coherent 
blue points as if they were following a ‘universal’ Galactic rotation curve””. The structure. The tentative decoupling between observed and predicted 
line segments represent loerrors, derived from a Gaussian fitting for the velocities also indicate that the Radcliffe Wave is a kinematically coherent 


observed velocities and the distance uncertainties for the predicted velocities, | structure. VLSR, velocity in the local-standard-of-rest frame. 
and are generally smaller than the symbols. The quasi-linear arrangementin 


Extended Data Table 1| Priors on Radcliffe Wave parameters 


Parameter Prior Parameter Prior 
Xo N (—900 pe, 100 pc) P N (3500 pc, 300 pc) 
Yo N (—900 pe, 100 pc) A N (170 pc, 20 pc) 
Z N (0 pe, 50 pc) d N (2.9 rad, 0.5 rad) 
Ly N (300 pe, 100 pc) In N (—0.5, 0.5) 
Yi N (0 pc, 100 pc) Ind N (—0.5, 0.5) 
al N (0 pe, 50 pe) Ina/pe U (3.5, 5) 
Lo N (300 pc, 100 pc) f U (0.15, 0.4) 
Y2 N (1400 pe, 100 pc) 


Zo N (0 pc, 50 pc) 
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Extended Data Table 2 | Constraints on Radcliffe Wave parameters 
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Cl, credible interval. 
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Median with 95% Cl 
35601873 pe 
160135 pe 
2.897) 33 rad 
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6.687320 
62715 pe 
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Extended Data Table 3 | Physical properties of the Radcliffe Wave 


Name Median with 95% Cl 


Length 2.7 + 0.2 kpc 
Scatter 60 +15 pc 
Amplitude 160 + 30 pc 


Mass > 3x 10° Mo 


Article 


Entanglement of two quantum memories via 
fibres over dozens of kilometres 


https://doi.org/10.1038/s41586-020-1976-7 


Received: 26 March 2019 


Accepted: 12 November 2019 


Yong Yu'”**, Fei Ma‘?**°, Xi-Yu Luo’”*, Bo Jing'?*, Peng-Fei Sun’”*, Ren-Zhou Fang'°, 
Chao-Wei Yang'”*, Hui Liu'”*, Ming-Yang Zheng’, Xiu-Ping Xie*, Wei-Jun Zhang’, Li-Xing You®, 
Zhen Wang’, Teng-Yun Chen’, Qiang Zhang'”*“*, Xiao-Hui Bao'”** & Jian-Wei Pan'?** 


Published online: 12 February 2020 


A quantum internet that connects remote quantum processors!” should enable a 


number of revolutionary applications such as distributed quantum computing. Its 
realization will rely on entanglement of remote quantum memories over long 


distances. Despite enormous progress 


3-12, at present the maximal physical separation 


achieved between two nodes is 1.3 kilometres”, and challenges for longer distances 
remain. Here we demonstrate entanglement of two atomic ensembles in one 
laboratory via photon transmission through city-scale optical fibres. The atomic 
ensembles function as quantum memories that store quantum states. We use cavity 


enhancement to efficiently create atom-photon entanglemen 


t® and we use 


quantum frequency conversion’ to shift the atomic wavelength to 
telecommunications wavelengths. We realize entanglement over 22 kilometres of 


field-deployed fibres via two-photon interference 


1718 and entanglement over 50 


kilometres of coiled fibres via single-photon interference”. Our experiment could be 
extended to nodes physically separated by similar distances, which would thus forma 
functional segment of the atomic quantum network, paving the way towards 
establishing atomic entanglement over many nodes and over much longer distances. 


Establishing remote entanglement is a central theme in quantum 
communication’””’. So far, entangled photons have been distributed 
over long distances both in optical fibres” and in free space with the 
assistance of satellites”. In spite of this progress, the distribution suc- 
ceeds only with an extremely low probability owing to severe transmis- 
sion losses and because photons have to be detected to verify their 
survival after transmission. Therefore the distribution of entangled 
photons has not been scalable to longer distances or to multiple 
nodes”°”?, A very promising solution is to prepare separate atom-pho- 
ton entanglement in two remote nodes and to distribute the photons 
to aintermediate node for interference”. Proper measurement of the 
photons will project the atoms into a remote entangled state. Although 
the photons will still undergo transmission losses, the success of remote 
atomic entanglement will be heralded by the measurement of photons. 
Therefore, ifthe atomic states can be stored efficiently for a sufficiently 
long duration, multiple pairs of heralded atomic entanglement could 
be further connected efficiently to extend entanglement to longer 
distances or over multiple quantum nodes through entanglement swap- 
ping”, thus making quantum-internet-based applications feasible’ 

Towards this goal, a great number of experimental investigations 
have been made with many different matter systems”*”°”’, each of 
which has its own advantages in enabling different capabilities. So far, 
entanglement of two stationary qubits has been achieved with atomic 


ensembles**’, single atoms’, nitrogen vacancy centres””°”, quantum 


dots", trapped ions®, and so on. Nevertheless, for all systems, the 
maximum distance between two physically separated nodes remains 
1.3 km (ref. '°). To extend the distance to the city scale, there are three 
main experimental challenges, which are: to achieve bright (that is, 
efficient) matter-photon entanglement, to reduce the transmission 
losses, and to realize stable and high-visibility interference in long 
fibres. In this Article we combine an atomic-ensemble-based quan- 
tum memory with efficient quantum frequency conversion (QFC)”®, 
and we realize the entanglement of two quantum memories via fibre 
transmission over dozens of kilometres. We make use of cavity enhance- 
ment to create a bright source of atom—photon entanglement. We 
employ the differential-frequency generation (DFG) process ina peri- 
odically poled lithium niobate waveguide (PPLN-WG) chip to shift the 
single-photon wavelength from the near-infrared part of the spectrum 
to the telecommunications O band for low-loss transmission in optical 
fibres. We then make use of a two-photon interference scheme!”"* to 
entangle two atomic ensembles over 22 km of field-deployed fibres. 
Moreover, we make use of a single-photon interference scheme” to 
entangle two atomic ensembles over 50 km of coiled fibres. Our work 
can be extended to long-distance separated nodes as a functional 
segment for atomic quantum networks and quantum repeaters”” 
and should soon enable repeater-based quantum communications, 
paving the way towards building large-scale quantum networks over 
long distances ina scalable way’. 
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Fig. 1| Schematic of the remote entanglement generation between atomic 
ensembles. Two quantum memory nodes (nodes A and Bin one laboratory) are 
linked by fibres to a middle station for photon measurement. In eachnode,a 
®’Rb atomic ensemble is placed inside a ring cavity. All atoms are prepared in 
the ground state at first. We first create alocal entanglement between atomic 
ensemble anda write photon by applying a write pulse (blue arrow). Then the 
write-out photon is collected along the clockwise (anticlockwise) cavity mode 
and sent to the QFC module. With the help ofa PPLN-WG chip and a1,950-nm 
pump laser (green arrow), the 795-nm write-out photon is converted tothe 
telecommunications O band (1,342 nm). The combination ofa half-wave-plate 
(HWP) and a quarter-wave-plate (QWP) improves the coupling with the 
transverse magnetic polarized mode of the waveguide. After noise filtering, 


Quantum memory with telecommunications interface 


Our experiment consists of two similar nodes linked via long-distance 
fibres, as shown in Fig. 1. In each node, an ensemble of about 10° atoms 
trapped and cooled by laser beams serves as the quantum memory”. 
All atoms are initially prepared in the ground state |g). Following the 
Duan-Lukin-Cirac-Zoller protocol”, in each trial, a weak write pulse 
coupling ground-state atoms to the excited state |e) induces a spon- 
taneous Raman-scattered write-out photon together with a collective 
excitation of the atomic ensembleina stable state |s) with a small prob- 
ability y. The collective excitation can be stored for a long time and 
later be retrieved on demand asa read-out photon ina phase-matching 
mode by applying the read pulse, which couples to the transition of 
lg)le). Thewrite-outandread-outphotonsarenonclassicallycorrelated. 
By employing a second Raman-scattering channel |g) > |e) > |s’), we 
can create the entanglement between the polarization of the write- 
out photon and internal state (|s) or |s’)) of the atomic ensemble>”. 
To further enhance the readout efficiency” and suppress noise from 
control beams, we build a ring cavity witha finesse (a figure of merit that 
quantifies the quality of the cavity) of 23.5 around the atomic ensem- 
ble. The ring cavity not only enhances the retrieval but also serves as 
a filter to eliminate the necessity of using external frequency filters 
to suppress noise”. 

To create remote atomic entanglement over a long distance, it is 
crucial that the photons are suitable for low-loss transmission in opti- 
cal fibres. Therefore we shift the wavelength of the write-out photon 
from the near-infrared (3.5 dB km ‘at 795 nm) to the telecommunica- 
tions O band (0.3 dB km‘ at 1,342 nm) via the DFG process. We make 
use of reverse-proton-exchange PPLN-WG chips. Optimal coupling 
efficiency and transmission for the 795-nm signal and the 1,950 nm 
pump are simultaneously achieved in one chip by an integrated struc- 
ture consisting of two waveguides (see Supplementary Information). 
Figure 2a shows that its conversion efficiency is up tO ony ~ 70% using 
a270-mW pump laser. During the conversion, there are three main 
spectral components of noise: at 1,950 nm, at 975 nm and at 650 nm, 
which come from the pump laser and its second and third harmonic 
generation. They areall spectrally far enough away from 1,342 nm that 


SNSPD 


Node B 


two write-out photons are transmitted through long fibres, interfered insidea 
beamsplitter and detected by two superconducting nanowire single-photon 
detectors (SNSPDs) with efficiencies of about 50% at a dark-count rate of 

100 Hz. The effective interference in the middle station heralds two entangled 
ensembles. Fibre polarization controllers (PCs) and polarization beamsplitters 
(PBSs) before the interference beamsplitter (BS) are intended to actively 
compensate polarization drifts in the long fibre. To retrieve the atom state, we 
apply aread pulse (red arrow) counter-propagating to the write pulse. By 
phase-matching the spin-wave and cavity enhancement, the atomic state is 
efficiently retrieved into the anticlockwise (clockwise) mode of the ring cavity. 
DM refers to dichroic mirror, LP refers to long-pass filter and BP refers to band- 
pass filter. 


we can cut them off via the combination of two dichroic mirrors and 
along-pass filter with a transition wavelength of 1,150 nm. The pump 
laser also induces broadband Raman noise, the spectral brightness of 
which around 1,342 nm is measured to be about 500 Hz nm. Thus, 
we use a bandpass filter (centred at 1,342 nm, with linewidth 5 nm) to 
confine this noise to approximately 2.5 kHz, which corresponds toa 
signal-to-noise ratio of >20:1, as depicted in Fig. 2a. The filtering process 
induces only 20% loss, and fibre coupling causes an extra 40% loss. The 
end-to-end efficiency of our QFC module is Jg¢¢=33%, which is the high- 
est value for all memory telecommunications quantum interfaces” *° 
reported so far, to the best of our knowledge. In addition, we perform 
a Hanbury-Brown-Twiss experiment for the write-out photons with 
and without OFC, with the results shown in Fig. 2b, which verify that 
the single-photon quality is well preserved during QFC. 


Entanglement over 22 km of field fibres 


We first perform a two-node experiment via two-photon interference 
(TPI)'®. Ineach node, we create entanglement between polarization of 
the write-out photon and the internal state of the collective excitation 
via a ‘double-A’-shaped level scheme (see Supplementary Information 
for level details). The entangled state can be expressed as 
(It) + |¥®))//2, where |*) or |v) denotes an atomic excitation in |s) 
or |s’) respectively, and|c5 )and|7y)denote polarization of the write-out 
photon. To characterize the atom-photon entanglement, we perform 
quantum state tomography, with the result shown in Fig. 3. We obtain 
a fidelity of 0.930(6) for node A and of 0.933(6) for node B when 
x=0.019. The two nodes are located in one laboratory onthe east cam- 
pus of the University of Science and Technology of China (31° 50’ 6.96’N, 
117° 15’ 52.07” E), as shown in Fig. 4a. Once the polarization entangle- 
ment is ready, the write-out photon is converted locally by QFC into 
the telecommunications band. Two photons from different nodes are 
transmitted along two parallel field-deployed commercial fibre chan- 
nels (11km per channel) from the University of Science and Technology 
of China to the Hefei Software Park (31° 51’ 6.01” N, 117° 11’ 54.72” E), as 
shown in Fig. 4a. Over there, we perform a Bell-state measurement by 
detecting two photons simultaneously with superconducting 
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Fig. 2| Performance of the telecommunications interface. a, The conversion 
efficiency n.,,and the signal-to-noise ratio (SNR) vary as a function of pump 
laser power. Blue dots refer to the overall conversion efficiency of the PPLN-WG 
chip, and red triangles refer to the signal-to-noise ratio at probability y= 0.015. 
b, Measurement of the second-order correlation functiong® (results of the 
Hanbury-Brown-Twiss experiment) with (red) and without (blue) QFC at 

X= 0.057. The write-out photons are only measured if the corresponding read- 
out photon is detected. The error bars represent one standard deviation. 


nanowire single photon detectors. A successful Bell-state measurement 
result projects, ina heralded way, the two atomic ensembles into a 
maximally entangled state: 


= 


|\Y*) pI = 2 


(IM aly ptlY ql? p) (1) 


with a internal sign determined by the measurement outcome of the 
Bell-state measurement. 

The strong polarization dependence of DFG in the PPLN-WG makes 
it difficult to perform QFC directly for a polarization encoded photon. 
Inthis experiment, we transform the polarization encoding into time- 
bin encoding” and let the two photonic modes pass through the QFC 


0.25 


Fig. 3 | Tomography of the atom-photon entanglement. a, b, The 
reconstructed density matrix between the write-out photon and the atomic 
spin-wave in node A (a) and in node B (b). In each element of the matrix, the 
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module in sequence with the same polarization. As shown in Fig. 4b, 
the transformation is realized through an AMZI and a fast Pockels cell 
which erases the polarization distinguishability. For the time-bin encod- 
ing, it is crucial that the two modes have a stable relative phase shift, 
whichis realized via active stabilization of the two AMZIs. Moreover, the 
transformation into time-bin encoding offers an additional advantage 
of robustness in long-distance transmission in fibres. 

Before long-fibre experiments, we characterize the atom-atom 
entanglement locally without QFC. For the measurement of the atomic 
qubits, we first apply Raman rotations”, then we retrieve the excita- 
tions into read-out photons and measure the polarization”. Measure- 
mentinan arbitrary basis is realized by configuring the Raman pulses. 
Figure 5a shows the measured fidelity averaged for |Y*) as a function 
of y. Aty = 2%, we get F = 0.798 + 0.063 for |W") and 0.829 + 0.036 for 
|W), respectively, which are in good agreement with the theoretical 
estimation. Furthermore, the fidelity is almost independent of x, after 
subtracting the accidental coincidences that are mainly due to high- 
order excitations in the Raman scattering process. 

The field-deployed long fibre (L = 22 km) induces 8 dB of attenuation. 
Besides, the long fibre leads to random rotations of polarization. To 
optimize the indistinguishability, we apply polarization filtering for 
the photons after long-fibre transmission before the Bell-state meas- 
urement. In addition, to get a high filtering efficiency, we perform 
active polarization compensation by replacing the manual polarization 
controllers in Fig. 1 with electric polarization controllers and minimiz- 
ing the reflections of the filtering polarization beamsplitters. We get 
an average efficiency of 98%, as shown in Fig. 4c. To reduce the back- 
ground noise in the fibre channels, we carefully cover all the fusion 
points and get an average background noise of about 280 Hz (including 
dark counts of the detector). In the long-fibre case, to increase the 
count rate, we set the excitation probability to y= 0.038 and perform 
entanglement verification in a delayed-choice fashion*. The measured 
visibility in the |t)/|\) basis is V, = 0.684 + 0.075 for |¥*) and 
V,=0.635 + 0.075 for |¥). Adjusting the Raman pulse delay &¢, we could 
observe a sinusoidal oscillation in the |) +|¥) basis as shown in Fig. 5b 
witha visibility of V,=0.574 + 0.064 for |W") and V, = 0.647 + 0.066 for 
|W). By assuming a similar visibility inthe |) +i]\) basis, the entangle- 
ment fidelity can be estimated as” F~ +l +V,+2V,) =0.708+0.037 
for |W"), and 0.732 + 0.038 for |W), which greatly exceed the bound of 
F >0.5 required to witness entanglement for a Bell state. The measured 
heralding rate is P,,.. = 1.46 x 10°, half of which is due to double-excita- 
tion events froma single node. Thus the entangling probability is esti- 
mated to be Pan, ~ Phey/2 = 0.73 x 10°. 


Entanglement over 50 km of coiled fibres 


The entangling probability in the TPI experiment’ is low since it scales 
as y’ and Non where 77, is the overall optical efficiency from one node 


Im(p) 


0.0 
-0.1 
height of the bar represents its real part, Re(p), and the colour represents its 


imaginary part, Im(p). The transparent bars indicate the ideal density matrix of 
the maximally entangled state. 


1,342 my. 


Fig. 4| Entanglement over field fibres. a, Bird’s-eye view of the remote 
entanglement experiment over the field fibre. Two quantum nodes are located 
at the University of Science and Technology of China (USTC). 
Telecommunications photons from two nodes are transmitted through two 
parallel field-deployed fibres to the middle station located at the Hefei 
Software Park. Each fibre is 11 km long and has an 4-dB attenuation for the 
1,342-nm photon. (Map data from Google, Maxar Technologies.) b, Setup for 
polarization photon QFC. Two polarization beamsplitters anda coiled 
polarization-maintaining delay fibre constitute an asymmetric Mach-Zender 


tothe Bell-state measurement. In contrast, asingle-photon interference 
(SPI) scheme” gives an entangling probability that scales linearly as a 
function of y and 7,2. Thus, targeting a much higher entangling prob- 
ability, we perform another two-node experiment via SPI. As shown in 
Fig. 1, two pairs of Fock-state entanglement are created at nodes A and 
B, respectively, in the form of|0),0), + ./¥ [L)pI1), where O and 1 repre- 
sent the number of photons or atomic excitations. The frequency- 
converted photons from both nodes are then transmitted along along 
fibre, later combined through a fibre beamsplitter to perform SPI and 
eliminate its ‘which way’ (that is, which fibre path the photon travels 
through) information, finally detected with superconducting nanow- 
ire single photon detectors. A click from detectors D, or D, (shown in 
Fig. 1) heralds that two ensembles are mapped into a maximally entan- 
gled state: 


IY) sp = 2-(lO)alDg£e"%11)4I0)5) 


a) (2) 


where A@is the accumulated phase difference between two fibre chan- 
nels. To keep Ag in equation (2) constant, we harness an intermittent 
phase-locking loop in situ during every experimental interval to elimi- 
nate phase drift (see Supplementary Information). 

To verify the Fock-state atomic entanglement, we follow a protocol 
introduced in ref. *. The degree of entanglement is quantified in terms 
of concurrence C, whichis a monotone function of entanglement and 
goes from 0 for aseparable state to 1 for a maximally entangled state. 
Its definition is C= max(0O, 2|d|- 2. [PooPy)/P. where p;is the probabil- 
ity of having i excitations in ensemble A and having / excitations in 
ensemble B, P=Po9+ Pot Pio t Pu, d= V(po, + Pyo)/2, and Vis the interfer- 
ence visibility of the single-excitation states. The excitation statistics 
of p;can be measured directly via photon counting of the two read-out 
modes and applying loss calibration. To measure the interference 
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interferometer (AMZI). Two orthogonal polarization components (|¢5)/|@)) of 
the 795-nm photon are separated in the time domain after the AMZI, andthe 
polarization information is actively erased by a Pockels cell. Then the time-bin 
encoded photonis sent to the QFC module. c, Probability distribution of the 
reflectivity for the polarization-filtering polarization beamsplitters (shownin 
Fig. Lafter long fibres), with active compensation. The data shown was 
recorded once per second and accumulated over 24 hours. d, Background 
noise in the superconducting nanowire single-photon detector over 24 hours. 


visibility V, we adda relative phase 6 between two read-out modes and 
mix them via a beamsplitter. Along with the scan of 8, counts in two 
output modes vary as a sinusoidal function of 6.as shown in Fig. 6, and 
thus we could deduce V= V,. For the short-fibre case (L =10 m) without 
QFC, at y=0.015, we get aconcurrence of C= 0.677+0.012 for |W") and 
C= 0.7114 0.012 for |Y). In this condition, the entangling probability 
in one trial is P..,, = 0.014, which is the highest probability of heralded 
remote entanglement creation to the best of our knowledge. 

Inthelong-fibre (L =10 kmand 50 km) cases, we add the QFC module. 
The phase noise during long-fibre transmission fluctuates faster than 
the capable band of intermittent phase-locking. Hence we additionally 
insert an auxiliary continuous 1,550-nm laser beam to uninterruptedly 
monitor phase fluctuation and actively stabilize it (see Supplementary 
Information). Measured results for the read-out photon interference 
at different fibre lengths are shown in Fig. 6. By fitting the sinusoidal 
oscillations and measuring the excitation statistics, we get a concur- 
rence result of C=0.428+0.013 at L = 10 km and C= 0.407+0.008 at 
L=50km for |). For|¥), the results are C= 0.416+0.008atL=10km 
andC=0.348+0.01lat L =50 km. Degradation of concurrence incom- 
parison with the case of short fibre without QFC is mainly due to the 
remaining noise after phase stabilization (see Supplementary Informa- 
tion), whichcan be greatly improved by optimizing the feedback loop. 
The measured heralded entangling probability is P.,,=1.57 x 10° for 
the 10-km fibre and P,,,, = 3.85 x 10* for the 50-km fibre, which corre- 
spond toanentanglement creation time of 7,,,,=32 ms and T,,,=0.65s, 
respectively. 


Discussion and outlook 


We have experimentally demonstrated two feasible ways to entan- 
gle two quantum memories via long-distance photon transmission 
in optical fibres. We summarize key parameters and results in Table 1. 
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Fig. 5 | Characterization of the remote entanglement via TPI.a, Average 
fidelity of the remote entanglement |) generated locally as a function of y. Blue 
squares refer to the measurement result. Red triangles show the corrected 
results through deduction of accidental coincidences (see Supplementary 
Information). The error bars represent one standard deviation. Pink shading 
indicates where fidelity is not sufficient to claim entanglement. b, Coincidences 
measured in the |+) =|*) + |V) basis for the two atomic qubits, normalized by the 
total coincidence of all combinations. The Raman pulse in node Ais applied 
slightly later thanin node B with an offset of 5¢, which induces a linearly changing 
phasein ¥Y% and results inthe observed oscillations. Parallel correlations (|+)|+) or 
||-)) of |¥"*) (blue squares) and |¥ (red triangles) are shown. Solid red and 
dashed blue lines correspond tothe fitting results. The 5.4-s oscillation period 
agrees with Zeeman splitting between |*) and |¥). This plot is based on 2.9 x 10* 
heralding events during a total measurement time of 487 hours over a period of 
30 days. The error bars represent one standard deviation. 


Even though the fibre distance of the SPI experiment is much longer 
than inthe TPI experiment, the SPI scheme offers a much higher prob- 
ability of entanglement creation, because only a single photon passing 
through half of the whole link is detected. In contrast, the TPI scheme 
requires the detection of two photons passing through the whole link. 
For the extension to physically separated nodes over long distances, the 
TPlschemeis straightforward, merely requiring that photons be indis- 
tinguishable. However, the extension of the SPI experiment requires 
more effort because the scheme is phase-sensitive. According to our 
analysis in the Supplementary Information, the main difficulty is to 
achieve phase correlation of remote independent control lasers. We 
have performed a preliminary test with two lasers locked indepen- 
dently to two ultrastable cavities, which shows that phase correlation 
can be built and stable for a duration that is long enough to generate 
remote entanglement (see Supplementary Information for details). 
Thus it is also feasible to extend our SPI experiment to long-distance 
separated nodes. 

The quantum link efficiency” 7,,,,, defined as the ratio of memory 
lifetime over entanglement generation time, is also an important figure 
of merit for two-node experiments. In our current work, decoherence 
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Fig. 6 | Characterization of the remote entanglement via SPI. When the 
atomic modesare retrieved as optical modes for interference, the photon count 
in one output mode of the fibre beamsplitter oscillates as a function of the 
relative phase 6 between the two optical modes, normalized by the total count 
of two output modes. Detector D, heralded events are shown ina; detector D, 
heralded events are shown inb. Blue squares, red triangles and green dots refer 
toL=10m,10 kmand 50 kmseparately. Sinusoids with corresponding colour 
(solid, dashed and dotted) show the fitting results. The 50-km result is based on 
1.7 x 10° heralding events during a total measurement time of 6 hours over a 
period of 2 days. The error bars represent one standard deviation. 


due to atom motions results ina memory lifetime of about 70 ps, which 
is much smaller than the entanglement generation time. According 
to our previous work ona very similar setup*, applying a three- 
dimensional optical lattice can improve the lifetime to the sub-second 


Table 1 | Comparison of two-node experiments 


Experiment TP SPI NV 

(this work) (this work) (2015; ref. '°) 
Physical separation 0.6 m 0.6m 1.3km 
Overall fibre length, L 22km 50 km 17km 
Entanglement probability, P,,, 0.73 x10 3.85 x10% 6.4x10° 
Entanglement quality F=0.720+0.027 C=0.378+0.007 F=0.92 + 0.03 
Entanglement creation 150s 0.65 s 1.3*10°s 
time, Tent 
Quantum link efficiency, Nin 1.45 x 10° 0.34 4.610% 


(ref. ) 


Assumed memory lifetime, t,, 0.22 s (ref. **) 0.22 s (ref.“*) 0.6 s (refs. 5) 


In the long-fibre case, the propagation delay results in a maximal repetition rate of R,., = C/L, 
where C = 210° ms’ is the speed of light in the fibre. Thus the heralded entanglement 
creation time is estimated as T= (RiepPent) » For the estimation of Mink = Tm/ Tents We Make use of 
state-of-the-art lifetime results, as listed in the last row. We chose the 1.3-km nitrogen vacancy 
(NV) experiment’? for comparison, because it is the only previous two-node experiment that 
has a fibre length in the kilometre regime. 


regime (t,,= 0.22 s), onthe basis of which the quantum link efficiency 
is estimated to be nin, = 0.34 for SPI and Nin = 2.9 x 107 for TPI. For 
further improvement of 7,,,,, it is crucial to increase the entangle- 
ment generation rate. For example, one may use Rydberg blockade to 
inhibit the high-order excitations during atom-photon entanglement 
preparation, and make the preparation process deterministic»*°. One 
can also make use of the multiplexing technique’ © to prepare mul- 
tiplexed atom-photon entanglement. Shifting the wavelength to the 
telecommunications C band, optimizing the coupling efficiencies and 
using better detectors will also greatly increase the remote entangle- 
ment rate. 

Extending these experiments to nodes separated by much longer 
distances will enable us to perform advanced quantum information 
tasks, suchas efficient quantum teleportation over long distances. 
By incorporating more quantum memories, our experiment may be 
extended to entangle multiple quantum memories over long distances 
via multi-photon interference*. One may also create two pairs of remote 
atomic entanglement over two sub-links and extend the distance of 
atomic entanglement via entanglement swapping, following the quan- 
tum repeater scheme”®. Concatenating this process could extend the 
distance sufficiently to beat the limit of direct transmission”. 
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Methods 


Time sequences 

Our experiment runs periodically, with each period being composed of 
an atomic loading phase and an entangling phase. Each loading phase 
takes 18 ms, during which we reload and cool the atoms and perform 
active phase-locking. In the entangling phase lasting 2 ms, we repeat 
entangling trials. For the SPI experiment, each trial lasts 5 ps including 
3 us for optical pumping and 2 ps for the write and read process. For the 
TPl experiment, each trial lasts 11 ps including 3 ps for optical pumping 
and 8 ps for the write and read process. The storage duration (relative 
delay of the read pulse in comparison with the write pulse) is 7 ps for 
the TPI experiment and 100 ns for the SPI experiment. 


Data availability 

The data that support the plots within this paper and other findings of 
this study are available from the corresponding author upon reason- 
able request. 
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Quantum cascade lasers are compact, electrically pumped light sources in the 
technologically important mid-infrared and terahertz region of the electromagnetic 
spectrum?”. Recently, the concept of topology’ has been expanded from condensed 
matter physics into photonics’, giving rise to anew type of lasing” * using 
topologically protected photonic modes that can efficiently bypass corners and 
defects‘. Previous demonstrations of topological lasers have required an external 
laser source for optical pumping and have operated in the conventional optical 
frequency regime> ®. Here we demonstrate an electrically pumped terahertz quantum 
cascade laser based on topologically protected valley edge states’ ". Unlike 
topological lasers that rely on large-scale features to impart topological protection, 
our compact design makes use of the valley degree of freedom in photonic crystals’, 
analogous to two-dimensional gapped valleytronic materials”. Lasing with regularly 
spaced emission peaks occurs ina sharp-cornered triangular cavity, even if 
perturbations are introduced into the underlying structure, owing to the existence of 
topologically protected valley edge states that circulate around the cavity without 
experiencing localization. We probe the properties of the topological lasing modes by 
adding different outcouplers to the topological cavity. The laser based on valley edge 
states may open routes to the practical use of topological protection in electrically 
driven laser sources. 


Quantum cascade lasers (QCLs) are electrically pumped semiconductor 
lasers based onintersubband electron transitions within multiple quan- 
tum wells in semiconductors'”. They are among the most important 
sources of mid-infrared and terahertz (THz) radiation owing to their 
compactness, electrical pumping performance and high efficiency”. 
Their practical applications include telecommunication™, THz signal 
processing”, imaging”, sensing and spectroscopy. As with any laser, 
the emission characteristics of a THz QCL depend on the design of 
the photonic cavity and are generally strongly affected by the cavity 
shape’”"®. One promising design is the use of topological edge states, 
which form running-wave modes that are robust against perturbations 
to the underlying structure> * and can efficiently bypass defects (which 
may arise during fabrication and packaging) and sharp corners. Unlike 
conventional waves, topological edge states resist the formation of 
localized standing-wave modes, which is helpful for suppressing the 
spatial hole-burning effect’’”°. This is a particularly important con- 
sideration for QCLs because their gain recovery processes are faster 
than the carrier diffusion, unlike in traditional semiconductor lasers”. 

Topological edge states arise at the interface between spatial 
domains that have topologically distinct bandstructures’. There 
have been substantial efforts to implement such states in photonics, 
motivated by potential applications in robust optical delay lines”, 


amplifiers” and other devices”. Topological lasers have been realized 
in one-dimensional (1D) Su-Schrieffer-Heeger (SSH)-like systems”°””, 
whose edge states act as high-Q (quality factor) nanocavity modes that 
lase under suitable gain. However, the edge states of 1D lattices do not 
support protected transport. For two-dimensional (2D) lattices, real- 
izing photonic topological edge states typically requires some means 
of effective breaking of time-reversal (T) symmetry to avoid the need 
to use magnetic materials*. For example, a recent demonstration of 
2D topological lasing*’ used an array of ring resonators in which the 
clockwise (CW) or counterclockwise (CCW) circulation of light in the 
resonators acts as a photonic pseudospin; staggered inter-resonator 
couplings generate an effective magnetic field and hence a T-broken 
band structure with non-trivial topology for each pseudospin”. This 
design requires large-scale structural features (for example ring reso- 
nators) far exceeding the operating wavelength. 

Valley photonic crystals (VPCs)'°" are photonic analogues of 2D val- 
leytronic materials” that host topological edge states protected bya 
valley degree of freedom established by the underlying lattice sym- 
metry. They have been demonstrated in several photonic crystal geom- 
etries*® °°, and similar valley-protected edge states have been realizedin 
sonic crystals”. In 2D materials, the valley degree of freedom can func- 
tion similarly to spin in a spintronic device but does not require strong 
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Fig. 1| Design of aterahertz quantum cascade laser with topologically 
protected valley edge modes. a, Each unit cell of the valley photonic crystal 
contains a quasi-hexagonal hole perforated through the top metal and the 
semiconductor layer ina metal-semiconductor-metal structure. The lattice 
period is a=19.5 um. b, Band structure calculated by 3D finite-element 
simulation. c, Projected band diagram for a supercell representing a straight 
domain wall separating two domains with opposite hole orientations, with10 


spin-orbit coupling”. Likewise, VPCs can provide robust light trans- 
port in highly compact structures with periodicity of the order of the 
wavelength’°”, without the need for magnetic materials or the complex 
construction of photonic pseudospins. They are therefore promising 
for the implementation of compact topological photonic crystal lasers. 

We have realized electrically pumped THz QCLs using the topologi- 
cal edge states of a VPC. Lasing is achieved using a topological wave- 
guide that forms a triangular loop, very different from conventional 
smoothly shaped optical cavities. Despite the sharp corners of the 
cavity, we find that the lasing spectrum exhibits robust regularly spaced 
emission peaks, a feature that persists under disturbances including 
a point outcoupling defect along an arm or at a corner of the trian- 
gle; an array of outcoupling defects surrounding the triangle; and an 
external waveguide acting as a directional outcoupler. By exploring 
different configurations of defects and coupled waveguides, we show 
that the various properties of the lasing modes can be explained by, and 
are consistent with, the topological valley edge states of the VPC. We 
show that ina comparable cavity based ona conventionally designed 
photonic crystal defect waveguide (Extended Data Fig. 8), the lasing 
modes behave very differently: they tend to be localized and exhibit 
highly irregular mode spacings. 

Our design consists of a triangular lattice of quasi-hexagonal holes 
drilled through the active medium of a THz QCL wafer, as shown in 
Fig. 1a. The lattice resembles a previous theoretical proposal for 
a VPC”, but with the dielectric and air regions inverted to account 
for the transverse-magnetic (TM) polarization of QCLs!”. With hex- 
agonal holes, the lattice would be inversion-symmetric, and its band 
structure would have Dirac points at the Brillouin zone corners (K and 
K’). By assigning unequal wall-length parameters d, and d, (Fig. 1a), 
the inversion symmetry is broken, and bandgaps open at K and K’. 
Assuming negligible coupling between the K and K’ valleys, the two 
gaps are associated with opposite Chern numbers +1/2, meaning that 
they are topologically inequivalent. The Chern numbers switch sign 
upon swapping d, and d, (that is, flipping the hole orientations)"°. We 
characterize the photonic band structure using three-dimensional 


quasi-hexagonal holes on each side. d, Simulated electric field distribution 
(IE,|) (top view and cross-section view) of a transmission mode ina topological 
waveguide with a120° corner. The white dashed line indicates the position of 
the cross-section view. SC, semiconductor. e, SEM image of a portion of the 
fabricated topological waveguide near the corner, corresponding to the area 
enclosed bya white rectangle ind. Domains 1 and 2 have opposite orientations 
and thus opposite valley Chern numbers. 


(3D) finite-element simulations (see Methods). With the lattice period 
a=19.5 pm, the bulk band-structure has a gap from 2.99 THz to 3.38 
THz (Fig. 1b). For a straight boundary between domains of opposite 
hole orientations, the projected band diagram has a gap spanned by 
edge states with opposite group velocities in each valley (Fig. Ic and 
Extended Data Figs. 1-4). These states are topologically protected 
provided that inter-valley scattering is negligible; this limitation is due 
tothe overall T symmetry of the VPC”, and similar limitations apply to 
other photonic topological edge states (at THz or other frequencies) 
that do not rely on magnetic materials’. Figure 1d shows simulation 
results in which a wave launched at mid-gap frequency crosses a 120° 
corner with negligible backscattering (a scanning electron microscope 
(SEM) image of sucha corner is shown in Fig. le). Near the domain wall 
(dashed line in Fig. le), the electric fields are concentrated in the QCL 
medium, which is favourable for lasing. 

We patterned the lattice onto a THz QCL wafer (see Methods), with 
a domain wall forming a triangular loop of side length 21a (Fig. 2a). By 
design, the QCL wafer’s gain bandwidth (approximately 2.95-3.45 THz; 
see Methods and Extended Data Fig. 5) overlaps with the photonic 
bandgap. Electrical pumping is applied only to the nearest three lat- 
tice periods on each side of the domain wall, to avoid supplying gain 
to bulk modes and to achieve low total pump current’. The in-plane 
modes are vertically outcoupled by scattering through the air holes 
drilled into the QCL active region, and through the defects described 
below. Calculating the eigenmodes with realistic material losses in the 
unpumped portion of the QCL medium (see Methods), we find regularly 
spaced high-Q eigenmodes at frequencies matching the previously 
computed bandgap (Fig. 2b). The typical eigenmode field distribution 
shows uniform electric field intensities along the domain wall, even at 
the sharp corners (top of Fig. 2c). We quantified the extended nature 
of the computed eigenmodes by showing that they have significantly 
lower inverse participation ratios along the domain wall, indicating less 
mode localization, compared with the eigenmodes of a conventional 
photonic crystal cavity of similar shape and size (see Methods and 
Extended Data Fig. 6). 
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Fig. 2| Fabrication and characterization of the topological THz QCL. a, SEM 
image of the THz QCL, whose optical cavity consists of anin-plane triangular 
loop of side length 21a. The yellow shaded area is pumped by electrical 
injection, while the other parts are passive. The green dashed line indicates the 
domain wall. The black rectangle indicates a defect (39 tm x 33.5 tm) etched 
right through the active medium of the THz QCL. Inset, cross-sectional 
schematic and magnified view of the domain wall. b, Calculated Q factors of the 
structure’s eigenmodes, with realistic material absorption losses 
(approximately 20 cm”) within the passive region. The shaded area indicates 


The regular spacing of the extended eigenmodes is a signature of 
running modes circulating around the triangular loop, analogous to 
whispering-gallery modes ina disk or a ring cavity” (see Methods). 
This is the most striking feature imparted by the non-trivial topology 
of the VPC. The upper panel of Fig. 2d (labelled ‘No defect’) shows the 
experimentally measured emission spectra for this structure at two 
representative pump currents. There are regularly spaced peaks at 
3.192 THz, 3.224 THz, 3.258 THz and 3.288 THz (vertical grey lines); the 
average free spectral range (FSR) is comparable to the FSR in the eigen- 
mode simulations. The intensities are fairly low, owing to poor vertical 
outcoupling: the valley edge modes lie near K and K’, below the light 
cone, so outcoupling occurs only by air-hole scattering. To improve 
the optical outcoupling efficiency (as well as to probe the robustness 
of the regular spacing against defects), we deliberately introduce a 
small rectangular defect, about 2a long and 3 awide, drilled through 
the top metal plate and the active medium in the irregular cavity loop 
(Fig. 2a). Numerical simulations show that the defect has negligible 
effects on the field distributions (Fig. 2c) regardless of whether it is 
placed onanarmoracorner of the triangle. The resulting experimen- 
tallasing spectra exhibit substantially stronger peaks, with intensities 
enhanced by 10-20 times (see Extended Data Fig. 7, where the light- 
current-voltage characteristics of the topological lasers without an 
outcoupling defect, with a side defect, and with a corner defect show 
clearly the laser threshold and the ‘roll-over’ position of the QCL), while 
the emission peaks still maintain a regular spacing and have negligible 
frequency shifts relative to the original device (middle and bottom 
panels of Fig. 2d). The preservation of the peak frequencies indicates 
that the defect does not spoil the running-wave character of the lasing 
modes. With increasing pump current, we observe variations in the 
relative peak intensities. This ‘mode-hopping’ effect can be attributed 
to mode competition as well as to band structure realignment in the 
QCL wafer with the increase inthe pump current; this is also observed 
inaconventional ridge laser fabricated on the same wafer (see Methods 
and Extended Data Fig. 5). 
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the photonic bandgap of the valley Hall lattice. c, Typical eigenmode electric 
field (|E,|) profiles at around 3.23 THz, with no outcoupling defect, witha side 
defect, and witha corner defect. d, Emission spectra for the QCL with no 
outcoupling defect (top), witha side defect (middle), and witha corner defect 
(bottom). Grey vertical lines indicate the peak frequencies of the defect-free 
QCL, which correspond closely to those of the QCL with a defect. For clarity, the 
emission spectraare vertically offset with increasing pumping currents. a.u., 
arbitrary units. 


For comparison, we fabricated a THz QCL with the same VPC design, 
but replaced the topological waveguide with a photonic crystal wave- 
guide of size-graded holes, with all holes having the same orientation 
(Extended Data Fig. 8a). As before, a defect is introduced to improve 
the outcoupling efficiency. With a side defect on the arm of the tri- 
angular cavity, the experimental spectra exhibit multiple irregularly 
spaced lasing peaks between 3.20 THz and 3.38 THz (Extended Data 
Fig. 8d). When the defect position is moved toa corner of the triangular 
cavity, acompletely new set of emission peaks is observed. Numerical 
simulations reveal numerous eigenmodes distributed over the upper 
half of the bandgap with a range of Q factors, no evident regular spac- 
ing patterns, and with modal intensities localized on different parts 
of the triangle (Extended Data Fig. 8c). This reflects the tendency of 
conventional waveguide modes to undergo localization, unlike the 
valley edge modes. 

To probe the spatial distributions of the topological lasing modes 
and verify their running-wave nature, we fabricated another set of lasers 
that included an array of rectangular outcoupling defects arranged in 
alarger triangle enclosing the topological cavity (Fig. 3a). The defects 
are separated by a distance of several wavelengths (4) away from the 
domain wall and hence couple evanescently to the topological cavity 
lasing modes. We refer to the set of defects along each arm of the triangle 
as an ‘emission channel’. By selectively blocking these emission channels 
(that is, covering the defects along certain arms), we canindirectly probe 
the spatial distributions of the lasing modes. Whenall emission channels 
are open, we observe regularly spaced emission peaks corresponding 
to topological lasing modes (Fig. 3b). Next, we sequentially cover two 
emission channels and measure the emission spectra from the remaining 
channel (Fig. 3a). In all three cases, the lasing spectra and the relative 
peak intensities under different pump currents are essentially the same 
(Fig. 3c—e), indicating that the lasing modes have equal intensities on 
the three arms of the triangular loop cavity. 

The topological edge states form degenerate pairs circulating CW 
or CCW, which have the same intensity distributions, gain and vertical 
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Fig. 3 | Topological laser with an array of evanescent outcouplers. 

a, Schematic of the structure. A triangular loop cavity (green triangle) hosting 
topological edge states is surrounded by an array of outcoupling defects (blue 
rectangles) distributed around the perimeter of a larger triangle. The defects 
are eight lattice periods away from the topological interface, allowing for 


outcoupling rates. Coupled-mode theory predicts that each topologi- 
callasing mode is composed of an equal-weight superposition of aCW 
and CCW pair (see Methods). The coexistence of CW and CCW modes 
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Fig. 4 | Topological laser in a directional outcoupling configuration. 

a, Schematic of the structure. A straight valley edge-state waveguide is 
introduced below the bottom arm of the triangular loop cavity (topological 
interfaces are indicated by green lines), with outcoupling gratings onthe left 
and right ends. The output facets are selectively covered to observe the 
directionality of the lasing modes. b, Intensity distribution for atypical 


evanescent outcoupling. The inset shows different defect-covering 
configurations for the spectral measurements. b, Emission spectra at different 
pump currents (vertically shifted for clarity), with all defects uncovered. 

c-e, Emission spectra at various pump currents for the three different defect- 
covering configurations shown in the inset of a. 


also explains why the defect along the cavity in Fig. 2 does not spoil the 
running-wave character, even inthe presence of backscattering induced 
by the defect. To test this, we fabricated a sample with an additional 
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topological eigenmode obtained viaa 3D numerical calculation. c,d, Emission 
spectra for the topological lasing modes (c) and non-topological lasing 
modes (d) with left and right output facets covered. For the topological lasing 
modes, the spectra have similar peak intensities, whereas for the non- 
topological lasing modes the spectra are completely different. The two sets of 
lasing peaks are measured under different pump currents. 
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straight topological waveguide located just below the triangular laser 
cavity (Fig. 4a). Each CW (CCW) cavity mode evanescently couples to 
the straight waveguide, propagates to the right (left) and then outcou- 
ples viaa second-order grating. This sample is found to support three 
topological lasing modes with frequencies near 3.2 THz. By selectively 
covering the left or the right side of the device, we observe that each 
lasing mode emits with approximately equal intensities from the two 
facets (Fig. 4c and Extended Data Fig. 10a), indicating that the CW and 
CCW cavity modes have equal weights. For comparison, we observe 
that the same sample also supports non-topological lasing modes in 
a neighbouring frequency range, just above the photonic bandgap 
(around 3.4 THz), at high pumping currents, for example at 2.96 A. 
The non-topological lasing modes are observed to emit with very dif- 
ferent intensities from the two output facets (Fig. 4d and Extended 
Data Fig. 10b). This demonstrates a qualitative difference in behaviour 
between topological and non-topological lasing modes ina single 
device. 

Insummary, we have implemented electrically pumped lasers based 
on the topological edge states of a valley photonic crystal, operating 
inthe THz frequency regime. By investigating several different device 
configurations, we have established a chain of evidence demonstrating 
the running-wave features of the topological lasing modes. The most 
noteworthy observation is the regular mode spacing, which arises 
because the modes have running-wave characteristics despite the 
sharp corners of the cavity and various other disturbances. Looking 
ahead, there are further opportunities in using the valley degree of 
freedom in other active photonic devices, and the realization of an 
electrically pumped topological laser points the way towards incor- 
porating topological protection into practical device applications. 
Apart from promising applications as a robust THz light source, this 
QCL platform may find immediate use in exploring the dynamical and 
nonlinear features of topological laser modes”. 
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Methods 


Device fabrication, characterization and numerical simulations 
We used THz QCL wafers with a three-well resonant-phonon GaAs/ 
Al, ;sGao gsAS design, with the gain curve spanning 2.95 THz to 3.45 THz 
(ref. **). The photonic crystal structures were patterned onto the wafer 
witha standard metal-semiconductor-metal configuration®, as shown 
in Fig. 1a. The topological waveguide consists of quasi-hexagonal 
holes with opposite orientations on either side of the topological 
interface, with wall lengths d, = 0.58a and d, = 0.26a (or vice versa), 
where a=19.5 pm is the lattice period. The outcoupling defect for the 
sample shown in Fig. 2 consists of a rectangular hole with fixed size of 
39 um x 33.5 pm. The outcoupling defects for the sample shown in Fig. 3 
consist of 12 rectangular holes of the same size, uniformly distributed 
along three triangle arms and situated eight lattice periods away from 
the topological interface. 

The fabrication process began with metal (Ti/Au, 20/700 nm) deposi- 
tion by an electron-beam evaporator onto the THz QCL wafer and an 
n*-doped GaAs host substrate, followed by Au/Au thermo-compres- 
sion wafer bonding. Wafer polishing and selective wet etching using 
NH,:H,0/H,0,/H,O (3/57/120 ml) solution were sequentially conducted 
to remove the THz QCL substrate down to anetch-stop layer. The etch- 
stop layer was then removed by 49% hydrofluoric acid solution, and 
the QCL active region was exposed for subsequent microfabrication. 
A300-nm SiO, insulation layer was deposited onto the THz QCL wafer 
using plasma-enhanced chemical vapour deposition, followed by opti- 
cal lithography and reactive-ion etching (RIE) to define the pumping 
area. The photonic structure patterns were transferred onto the THz 
QCL wafer by optical lithography, with deposition and lift-off of the 
top metal layer (Ti/Au, 20/900 nm). With the top metal layer as a hard 
mask, the photonic structures were formed by reactive-ion dry etch- 
ing through the active region with a gas mixture of BCI,/CH, = 100/20 
standard cubic centimetres per minute. The top metal layer (remnant 
thickness approximately 300 nm) was retained as atop contact for cur- 
rent injection. The host substrate was covered by a Ti/Au (15/200 nm) 
layer as bottom contact. Finally, the device chip was cleaved, indium- 
soldered onto a copper heatsink, wire-bonded and attached to acry- 
ostat cold finger for characterization. 

The fabricated THz laser devices were characterized using a Bruker 
Vertex 70 Fourier-transform infrared spectrometer with a room-tem- 
perature deuterated-triglycine sulfate detector. Mounted ina helium- 
gas-stream cryostat with temperature control at 9 K, the devices were 
driven by a pulser with repetition rate of 10 kHz and pulse width of 
500 ns. The emission signal was captured by the detector in the vertical 
direction and Fourier-transformed into a spectrum, withthe spectrom- 
eter scanner velocity of 1 kHz and spectrum resolution of 0.2 cm“. To 
measure the emission from different outcouplers, for example, the rec- 
tangular outcoupling defects or gratings, a thin metal sheet (approxi- 
mately 100 pm) coated with an absorptive PMMA layer (approximately 
100 um) was used to cover the device emission surface partially. The 
absorption layer (single-pass absorption rate approximately 40%) 
was coated to reduce the light reflection from the metal sheet. The 
cover was positioned using a custom stage with a positional accuracy 
of about 20 um. The cover was placed very close to the device surface: 
the gap between the device surface and the metal sheet was smaller 
than 300 pm. 

In this work, all numerical results were calculated using the finite- 
element method simulation software COMSOL Multiphysics. In 3D band 
diagram calculations, the 10-~um-thick QCL medium was modelled asa 
lossless dielectric with a refractive index of 3.6, sandwiched between 
metal layers modelled as perfect electrical conductors. All band struc- 
tures were computed for TM polarization. The projected band diagram 
in Fig. 1c was obtained witha supercell with 10 quasi-hexagonal holes on 
each side of the domain wall; spurious modes localized at the bounda- 
ries of the computational cell were removed before plotting. In 3D 


eigenmode calculations, the unpumped portion of the QCL medium 
was modelled as a lossy dielectric, accounting for the intrinsic loss of 
the actual semiconductor medium; the imaginary part of the refractive 
index is 0.0159, corresponding to an absorption loss of about 20 cm7. 
To reduce computational workload, eigenmodes were computed fora 
slightly smaller structure with several outermost unit cells removed, 
but with the triangular loop cavity left unchanged. 


Valley photonic crystal design 

Extended Data Fig. la shows the 2D band structure of atriangular-lattice 
photonic crystal whose unit cell comprises a regular hexagonal air holes 
in the dielectric of refractive index 3.6. This dielectric medium repre- 
sents the QCL wafer medium in the actual device. The band structure 
exhibits Dirac points—linear band-crossing points between the two 
lowest TM photonic bands—at the corners of the hexagonal Brillouin 
zone, denoted by K and K’. Near K (K’), the Bloch states can be described 
by an effective 2D Dirac Hamiltonian”: 


H= Up +4, 0% + 4,0) (1) 


where q= (q,, gy) is the wavevector measured from K (K’), vp is the group 
velocity, 0,, are the first two Pauli matrices, and the + (—) sign corre- 
sponds to K (K’). 

Setting d, # d, breaks the C,, symmetry of the photonic crystal, and 
lifts the degeneracy of the Dirac points, as shown in Extended Data 
Fig. 1b. In Extended Data Fig. 1c, d, we plot the absolute values of the 
out-of-plane electric field |£,| and Poynting vectors within each unit 
cell at the K and K’ points for both the lower band and upper band. The 
modes inthe two valleys are time-reversed counterparts, as shown by 
the opposite circulations of electromagnetic power. 

The effect of the symmetry breaking can be modelled as amass term 
added in the effective Dirac Hamiltonian: 


H=0/(+q,0,+ Oy) +vUpmo, (2) 


where m represents the effective mass of Dirac particles, ando,is the 
third Pauli matrix. The band structures near the two valleys (that is, K 
and K’) have identical dispersion but are topologically distinct. This 
can be shown by computing the valley-projected Chern number’, 
defined as 


1 
Cyr = 2d Ox AQ)dS (3) 


where the integration is performed only for half of the Brillouin zone 
(HBZ) containing K or K’. Here Q,,x(q) is the Berry curvature defined 


as O=V,xA(k), where V, = & a} A(k) represents the Berry connec- 
x ORY 


tion, thatis, A,(k) =f, Pruj(r)V.u;(r), where V, = ( a) and u, 


(r) represents the Bloch wavefunctions that can be calculated from 
numerical simulation. 

Extended Data Fig. 2 shows the numerically calculated Berry curva- 
ture near K and K’ points, whose integration over HBZ gives rise to 
opposite valley Chern numbers, that is, C,.=1/2 and C, =—1/2. Rotating 
the quasi-hexagonal motif by 180° is equivalent to flipping the sign of 
the mass parameter m, which flips the signs of the valley Chern numbers 
(Ce =-1/2,C, = 1/2). 

Extended Data Fig. 3 shows a sample of photonic crystal consisting 
of two domains with opposite valley Chern numbers. The differences 
in valley Chern numbers between the two domains are 


nit cell 


ACE Cy- Chandi ACy= Cg Cee (4) 


Thus, based on the topological bulk-boundary correspondence 
principle’, there shall be one forward-propagating edge state at K’ 
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and one backward-propagating edge state at K. This is verified by the 
numerically calculated photonic band structure shown in Extended 
Data Fig. 3b. The field plots in Extended Data Fig. 3c, d show that the 
edge states are indeed strongly localized to the domain wall, that is, 
between the two domains with opposite valley Chern numbers. 


Comparison of 2D and 3D band structures 

Ina2D VPC with parameters stated in the main text, the bulk TM band 
structure has a bandgap from 3.23 THz to 3.51 THz (the relative band- 
width of around 8%), as shown by the black curves in Extended Data 
Fig. 4a. For a2D structure with two domains of opposite hole orienta- 
tions separated by a straight domain wall (such as in Extended Data 
Fig. 3a), the projected bandgap occupies a similar frequency range, 
and the valley edge states traverse the whole projected bandgap as 
shown by the black curves in Extended Data Fig. 4b. 

In the actual experiment, the VPC is 3D, patterned onto a THz QCL 
wafer in a metal-semiconductor-metal configuration®. The active 
medium is 10 pm thick, sandwiched between two metal plates to ensure 
subwavelength vertical confinement of the TM-polarized lasing waves 
within the active layer. Numerical results for the 3D structure are shown 
by the red curves in Extended Data Fig. 4. The band structure and pro- 
jected band diagram are shifted to lower frequencies, but otherwise 
remain qualitatively similar. 


Emission characteristics of conventional lasers (ridge laser and 
VPC laser) 

To characterize the gain spectral range and other properties of 
the THz QCL wafer, we fabricated and studied a conventional 
ridge laser. Extended Data Fig. 5a plots the emission spectra at 
different pump currents. On scanning through the entire dynamic 
range of the pump, we observe that the gain spectral range is 
approximately 2.95 THz to 3.45 THz. With increasing pump, the 
emission spectrum envelope gradually blueshifts, which is due to the 
Stark shift of the intersubband transition in the THz quantum cascade 
medium*®””. 

To align the frequency of the VPC bandgap to the gain peak of the 
THz QCL (approximately 2.9-3.45 THz, evidenced by the range of emis- 
sion peaks of the ridge laser), we fabricated a series of VPCs of various 
periods without any domain wall loop cavity. By studying the lasing 
peaks, we determined that the photonic bandgap of a VPC laser with 
a=19.50 um and size of approximately 820 um x 725 pm extends from 
2.99 THz to 3.39 THz, which is a good match for the gain peak range of 
the THz QCL wafer. These results also helped us to estimate the effec- 
tive refractive index of the QCL active region to be around 3.60 at the 
operation frequency. 


Extended nature of topological modes 

The key feature of the topological laser cavity is that it supports whis- 
pering-gallery-like running-wave modes even in presence of the three 
sharp corners. By contrast, a trivial cavity cannot support such modes 
due to strong back-reflection at the corners, which localizes the elec- 
tromagnetic field at various portions of the cavity. 

This phenomenon can be quantified by calculating the inverse par- 
ticipation ratio (IPR) along the one-dimensional (1D) curve correspond- 
ing to the triangular loop. The IPR is widely used to characterize the 
localization of modes and is defined as*° 


SIE, Ode 
[J, le-(w, ede} 


where €is the coordinate parametrizing the 1D curve of length L. The 
denominator in equation (5) ensures normalization. For a mode con- 
fined toalengthL,, IPR goes as L/L, whereas for completely delocalized 
modes L,)=L, leads to IPR ~1; withincreasing localization, L) decreases 
and therefore the IPR increases. 


IPR(@) = (5) 


The numerical IPR results for the triangular loop cavity are shown 
in Extended Data Fig. 6. As expected, the topological modes have sub- 
stantially smaller IPR than the non-topological modes. 


Topological modes in the triangular loop cavity 
Figure 2b of the main text shows the numerically calculated modes ofa 
triangular cavity formed between two topologically inequivalent VPC 
domains. These high-Q modes are constructed out of topological edge 
states that have the characteristics of running waves. 

From the condition that running waves should interfere construc- 
tively over each round trip, we can estimate the mode separation or 
the FSR. Constructive interference requires 


_2n 


Ak=" 


(6) 
where k denotes the wavenumber for the running-wave-like envelope 
function corresponding to any given edge state, and Z is the total path 
length (the circumference of the triangular loop). The edge states have 
an approximately linear dispersion relation Aw = vAk, where w is the 
angular frequency detuning relative to mid-gap and v is the group 
velocity. Hence, the FSR is 


af=7 ”) 


For the structure, L ~ 1,257 1m, and we estimate v = 4.53 x 10’ ms? 
from numerical calculations (Fig. 1c). This yields Af= 0.036 THz, which 
matches well with the simulations and the experimental results (for 
example, Af= 0.035 THz for the simulation results shown in Fig. 2b, and 
Af 0.033 THz in the experimental results shown in Fig. 2d). 

Owing to time-reversal symmetry, each running-wave mode has a 
degenerate counterpart with opposite circulation direction. Hence, 
modes can be constructed from superpositions of CW and CCW run- 
ning waves. Numerical solvers typically do not return the CW and CCW 
solutions, but rather the superpositions of the two running waves. 
However, CW and CCW modes can be reconstructed from suitable 
superpositions of the degenerate solutions returned by the numerical 
solver (Extended Data Fig. 9). 

The CW and CCW valley edge modes form two orthogonal basis 
modes and thus each topological lasing mode is asuperposition of CW 
and CCW valley edge modes“. To determine the superpositions, we can 
use the framework of coupled-mode theory. There are two important 
effects acting onthe CW and CCW modes: weak coupling between CW 
and CCW modes, induced for example by symmetry-breaking defects 
inthe VPC; and gain and loss, which are due to amplification by the gain 
medium, material dissipation and radiative outcoupling. 

Using coupled-mode theory, we represent the states of the laser by 
w=(ab)', where aand bare the CW and CCW modeamplitudes respec- 
tively. The condition for steady-state lasing is 


| §  _ = 
Hope! rT y}p= Bay (8) 


where 


isa Hermitian Hamiltonian containing a coupling rate x between the CW 
and CCW modes, both of which have zero frequency detuning, 6m is the 
frequency detuning of the steady-state lasing mode, gis the amplifica- 
tion rate due to the gain medium, and yis the loss rate due to material 
dissipation and radiative outcoupling. Note that the gainis saturable. 

Importantly, the non-Hermitian terms are diagonal because the CW 
and CCW modesare topologically protected running waves that have 
the same intensity distribution, and therefore should experience the 
same rates of gain and loss. 


Regardless of the non-Hermitian terms, the solutions to the coupled- 
mode equation are 


ee: - 
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In other words, the CW and CCW modes should contribute equally to 
the steady-state lasing mode. The overall amplitude can be determined 
by setting the imaginary part of the eigenproblem to zero. 

These results hold not only at the lasing threshold, but also in the 
above-threshold regime where gain saturation is in effect. Above thresh- 
old, provided x is not too large, a single steady-state lasing mode is 
spontaneously chosen from one of the two possible solutions solved 
above, and the other solution is suppressed (that is, its amplitude is 
pinned to zero) by gain competition. 

The above analysis rests on the idea that the underlying aand b modes 
are counter-propagating topological modes. It does not apply if the 
modes experience different gain/loss rates (so that the non-Hermitian 
term is non-diagonal), or if they are non-degenerate—as is the case in 
the non-topological cavity, which lacks running-wave-like edge states. 


(1 -1)' for 8w=-K (9) 


Bidirectional outcoupling of laser modes 

Here, we provide more details about the topological laser in the direc- 
tional coupling configuration (Fig. 4 of the main text and Extended 
Data Fig. 10). 

This structure features a straight topological waveguide placed below 
the triangular cavity (Fig. 4a). The valley Chern number difference 
along the straight waveguide is opposite to that along the bottom arm 
of the triangular cavity. Owing to valley conservation, a CW (CCW) 
cavity mode evanescently couples toa right- (left-)moving valley edge 
mode onthe straight waveguide. The output facets on the left and right 
ends of the straight waveguide are second-order gratings. After using 
numerical simulations to optimize the grating parameters, the reflec- 
tion ratio is estimated to be <10%, ensuring negligible light feedback 
into the straight waveguide and laser cavity. 

Numerical simulations of the structure reveal topological eigen- 
modes at frequencies near 3.2 THz, within the topological gap of the 
VPC. The intensity plot for a typical eigenmode is shown in Fig. 4b. 
These numerically calculated topological eigenmodes are all twofold 
degenerate, consistent with the degenerate CW and CCW cavity modes 
of the triangular loop. Moreover, the structure hosts non-topological 
lasing modes around 3.4 THz, around the edge of the upper band. The 
non-topological modes are all non-degenerate. 

In the experiment, each topological mode exhibits a ‘peak ratio’ 
(the ratio of emission peak intensities from two output facets) close 
to unity. A typical spectrum is shown in Fig. 4c, and the light-current 
curves are shown in Extended Data Fig. 10a. For the non-topological 


modes, the peak ratios are far from unity (Fig. 4d and Extended Data 
Fig. 10b); for some of these, the peak is only clearly observable when 
one facet is covered but lies within the noise floor when the other 
facet is covered. 

During repeated experimental runs with the same sample, we observe 
arepeatable set of peak frequencies for both the topological and non- 
topological lasing modes, but the exact peak intensities vary between 
runs due to the imprecise relative alignment of the covering metal 
sheet and sample. We observe that the topological modes have peak 
ratios close to unity, whereas the non-topological modes have differ- 
ent peak ratios. 


Data availability 


The data sets generated during and/or analysed during the cur- 
rent study are available in the DR-NTU(Data) repository https://doi. 
org/10.21979/N9/PECAGQ. 
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Extended Data Fig. 1| Design of the 2D VPC. a, Photonic band structure for the The Dirac points at K andK’ are lifted.c,d, Plots of the absolute value of the 


TM modes ofa 2D triangular photonic crystal of hexagonal air holes in out-of-plane electric field |F,| (colour maps) and Poynting vector (white arrows) 
dielectric (refractive index 3.6), with unbroken inversion symmetry. The unit within each unit cell at the K and K’ points. For both the lower band (c) and 
cell and Brillouin zone are shown inset. b, Band structure after breaking upper band (d), the modes in the two valleys are time-reversed counterparts, as 


inversionsymmetry by setting d, # d,. Inset, unit cell, with d,=0.58a, d,=0.26a. shown by the opposite circulations of electromagnetic power. 
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Extended Data Fig. 2| Berry curvatures calculated using 2D Bloch wavefunctions for the lowest TM band. a, Near the K’ valley. b, Near the K valley. 
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box). b, Projected band diagram for the supercell. The red (blue) curve atK,K’. 
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Extended Data Fig. 4 | Comparison between 2D and 3D TM photonic band with central dielectric thickness of 10 um. b, Projected band diagrams fora 
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after the experiment, that is, metal-semiconductor-metal heterostructure clarity. 
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Extended Data Fig. 5 | Emission characteristics of a conventional ridge laser fabricated on the quantum cascade wafer. a, Emission spectra at different pump 
currents. b, Light-current-voltage curves of the ridge laser. 
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The topological cavity’s eigenmodes have consistently lower IPR, indicating 


that they are more uniformly extended along the loop. d-f, Intensity 


Extended Data Fig. 6 | IPR for trivial and topologically non-trivial modes. 


a, b, Schematics showing the topologically non-trivial (a) and trivial (b) 


distributions for three representative eigenmodes of the trivial cavity. For 


cavities. The 1D interfaces along which the IPR is calculated are indicated by red 
and blue lines. For the design of the trivial cavity, see Extended Data Fig. 8a. 


comparison, eigenmodes of the topological cavity are shown in Fig. 2c (top) of 


the maintext. 


c, IPR versus frequency for eigenmodes in the band gap for each type of cavity. 
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Extended Data Fig. 7 | Light-current-voltage curves of the topological laser Allintensities in three sub-figures are measured with the same intensity scale. 
with different designs. a, The topological laser without an outcoupling defect. Itcanbe inferred from these curves that the emission power is greatly 

b, The topological laser with a side defect.c, The topologicallaserdevicewitha | enhanced bythe outcoupling defect. 

corner defect. The corresponding device emission spectra are shown in Fig. 2d. 
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Extended Data Fig. 8 | Topologically trivial laser with triangular loop cavity 
formed by aconventional photonic crystal waveguide. a, SEM image of the 
fabricated structure. Inset, close-up view of the waveguide with single hole 
orientation, which consists of five rows of size-graded holes (with size scale 
factors s,= 0.77, s,= 0.87, 5,=1). A defect (39 tm x 33.5 pm) is included to 
improve outcoupling efficiency. b, Calculated eigenmode Q factors for the 
structure with a side defect. The shaded area indicates the photonic bandgap 
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of the valley Hall lattice. c, Electric field (|£,|) plots for typical calculated 
eigenmodes of the trivial cavity. The white square indicates the position of the 
side defect. d, Emission spectra of the topologically trivial lasers with a side 
defect (top panel) and corner defect (bottom panel) at different pump 
currents. The spectra are vertically offset for clarity. The emission peaks of two 
lasers are different and donot present aclear and regularly spaced patternin 


frequency space. 
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Extended Data Fig. 10 | Lasing peak intensity curves for topological and 
non-topological lasing modes in the same laser device in a directional 
outcoupling configuration. The schematic of the device is shown in Fig. 4a of 
the main text. a, b, Here, peak intensities are plotted versus pump current for 
the topological modes (a) and non-topological modes (b) of the same sample. 
P1,P2 and soonrepresent different emission peaks. Solid (dashed) curves 
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correspond to the measurement with left (right) side of the device covered. 
Emission spectra at two representative pump currents are shown in Fig. 4c,d of 
the main text. For the topological lasing modes, the spectra from two output 
facets have comparable peak intensities, whereas for the non-topological 
lasing modes the peaks differ in intensity and frequency in the two cases. 
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Solid-state lithium metal batteries require accommodation of electrochemically 
generated mechanical stress inside the lithium: this stress can be’” up to 1 gigapascal 
for an overpotential of 135 millivolts. Maintaining the mechanical and electrochemical 
stability of the solid structure despite physical contact with moving corrosive lithium 
metal is ademanding requirement. Using in situ transmission electron microscopy, 
we investigated the deposition and stripping of metallic lithium or sodium held within 
alarge number of parallel hollow tubules made of a mixed ionic-electronic conductor 
(MIEC). Here we show that these alkali metals—as single crystals—can grow out of and 
retract inside the tubules via mainly diffusional Coble creep along the MIEC/metal 
phase boundary. Unlike solid electrolytes, many MIECs are electrochemically stable in 


contact with lithium (that is, there is a direct tie-line to metallic lithium onthe 
equilibrium phase diagram), so this Coble creep mechanism can effectively relieve 
stress, maintain electronic and ionic contacts, eliminate solid-electrolyte interphase 
debris, and allow the reversible deposition/stripping of lithium across a distance of 
10 micrometres for 100 cycles. A centimetre-wide full cell—consisting of 
approximately 10”° MIEC cylinders/solid electrolyte/LiFePO,—shows a high capacity 
of about 164 milliampere hours per gram of LiFePO,, and almost no degradation for 
over 50 cycles, starting with a 1x excess of Li. Modelling shows that the design is 
insensitive to MIEC material choice with channels about 100 nanometres wide and 
10-100 micrometres deep. The behaviour of lithium metal within the MIEC channels 
suggests that the chemical and mechanical stability issues with the metal-electrolyte 
interface in solid-state lithium metal batteries can be overcome using this 


architecture. 


Demands for safe, dense energy storage provide incentive for the 
development ofall-solid-state rechargeable Li metal batteries®®. (Lith- 
ium metal batteries are to be distinguished from lithium ion batteries, 
in which the anode does not contain metallic lithium.) Lithium in the 
body-centred cubic (b.c.c.) crystal structure has 10x the gravimetric 
capacity and 3x the volumetric capacity of graphite®. The problem is 
that the non-lithium-metal volume fraction @, consisting of entrapped 
solid-electrolyte interphase (SEI) debris, pores and other ancillary/ 
host structures, tends to increase with battery cycling’ ’. Once a Li- 
metal-containing anode has @ > 70%, it loses its volumetric advantage 
compared to a graphite anode. Most solid electrolytes are thermody- 
namically unstable in contact with the corrosive Li metal”, forming 
SEl ata fresh solid electrolyte/Li metal interface. This thermodynamic 
instability can be predicted by checking the equilibrium phase diagram: 
it occurs when the solid electrolyte phase does not havea direct tie-line 


connecting to the Li, phase. Ab initio calculations have shown that a 
small number of compounds such as LiF, LiCl and Li,O are absolutely 
stable against Li metal, but they are poor ionic conductors”. Good 
solid electrolytes (ionic conductors but electronic insulators) will 
decompose upon contact with Li,,, to form SEI. Under large fluctuating 
mechanical stresses the SEI and the solid electrolyte can spall off and get 
entangled with Li: andas they are electronic insulators, they can cut off 
electronic percolation and cause ‘dead lithium’. The dual requirements 
of maintaining contact and adhesion with moving Li without fracture 
(mechanical stability) while reducing SEI production (electrochemical 
stability) makes the problem hard from an electrochemo-mechanics 
perspective. 

Metallic lithium has a volume Q = 21.6 A? per atom = 0.135 eV per GPa. 
This means that an overpotential U of —0.135 V, whichis frequently seen 
experimentally in Li deposition, can in principle generate GPa-level 
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Fig. 1| Mixed ionic-electronic conductor (MIEC) tubules as 3D Lihosts. 
Schematic process of creep-enabled Li deposition/stripping in an MIEC tubular 
matrix witha geometry of {h, W, w}, where Coble creep dominates via interfacial 
diffusion along the MIEC/Li,,, incoherent interface. Main panel, cross-section 
of the matrix: MIEC tubules are shownas red, with white arrows indicating the 
free movements of electrons (e) and lithium ions (Li’); the three required 
properties of the MIEC (red arrow) are labelled 1, 2 and 3. Anelectronic and 
Li-ion insulator (ELI: yellow) material is used at the root of the MIEC asa ‘binder’ 
to the solid electrolyte (‘Ductile all-solid-state electrolyte’). a, B andy are Li,.. 
drops that are still recoverable. The boxed areais shown expanded in the inset: 
see Methods section ‘Quantitative analysis’ for details. 


hydrostatic pressure (P, eta) in Liaccording to the Nernst equation”, 
and this stress will be transmitted to the surrounding solid structure. 
If these electrochemically generated mechanical stresses are not 
relieved, Li fingers or wedges may crack the solid electrolyte’, through 
its grain boundaries or through its lattice. As the crack tip may be closer 
tothe cathode, there is a transport advantage to the deposition of more 
Liat the crack tip, SO P,jmeta(X) is generated again at the crack tip where x 
is spatial position, and the process repeats until electrical shorting hap- 
pens’. The well-known elastic-modulus-based criterion® for mechanical 
stability isnot applicable when considering this crack-based degrada- 
tion mode. The potentially huge electrochemically generated stress 
Pyimetal(X), if not relaxed quickly by creeping of Li, would fracture solid 
components. This and the chemical attack leading to SEI production 
makes the architecture of all-solid-state rechargeable Li metal batteries 
difficult to construct, even at a conceptual level. 

Because Li melts at 7,,=180 °C and is a soft metal, an alternative 
concept isto have the Li flow into and out of a3D tubular structure like 
that shown in Fig. 1", keeping contact with a 3D solid host structure 
made of mixed ionic-electronic conductor (MIEC) that is absolutely 
electrochemically stable against Li metal (that is, having a direct tie-line 
to Li,,. phase on the equilibrium phase diagram without intervening 
phases). Such 3D host structures have been studied experimentally in 
the past”, but here we seek quantitative mechanistic understanding 
for the plating/stripping behaviour. We note that, in our construction, 
we choose only the MIEC (not the solid electrolyte) to bein thermody- 
namic equilibrium with Li, so it will not generate any SEI upon contact 
with Li, removing the possibility of SEI and SEI-based degradation. 
At300K, the homologous temperature for Liis 7/T,,= 0.66, so Lishould 
manifest an appreciable creep strain rate €(T, 0) (where gis the devia- 
toric shear stress) by dislocation power-law creep or diffusional creep 
mechanisms, according to the deformation mechanism map of 
metals’*”’. Creep imparts an effective viscosity 7 =0/é(T,0),so the 
Li may behave like an ‘incompressible work fluid’, and advancement 
and retraction of pure Li may be established inside the MIEC tubules 
(which cannot chemically react with the corrosive work fluid), driven 
by the chemical potential/pressure gradient —QVP, jmetai(X). The com- 
petition between interfacial-diffusional Coble creep, bulk diffusional 
Nabarro-Herring creep, and hybrid diffusive-displacive dislocation 
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creep mechanisms depends onthe grain size. The pore space helps to 
relieve the stresses (hydrostatic and deviatoric) by allowing the 
Li,,. to backfill by diffusion, so the all-solid-state host is not fractured 
during cycling (ensuring mechanical stability), while maintaining high- 
quality electronic and ionic contacts. While few solid electrolytes with 
satisfactory Li-ion conductivity are electrochemically stable against 
Lipce (ref. 3), there exist many MIECs with such stability and they will 
not decompose to form fresh SEI at the MIEC/metal interface. These 
include popular anode materials like lithiated graphite or hard carbon 
(LiC, is an MIEC), Si (Li,,Si, is an MIEC), Al (Li,Al, isan MIEC) and soon”, 
as wellas materials with appreciable solubility of Liatoms as arandom 
solid solution (CuLi,) or even bulk-immiscible metals (such as M=Ni, 
W) that may nonetheless support some Li solubility at the M/Li,,, phase 
boundary. Here we focus on lithiated carbonaceous materials as the 
MIEC ‘rail’ that guides Li,.. deposition and stripping, although in the 
‘Quantitative analysis’ section of Methods we show that this design is 
almost independent of MIEC material when using channels about 
100 nm wide and 10-100 ppm deep. 

The cycling of Li under alternating negative and positive overpo- 
tential is rather like the application of a pump, which can produce 
fatigue in the solid host structure. To avoid such fatigue, the MIEC walls 
should be sufficiently strong and ductile to accommodate the stresses 
generated by P, meta and capillarity. Typical graphene foam with too 
thina wall thickness w may not be appropriate because such walls may 
easily tear, crumble or fold due to van der Waals adhesion. Also, the 
contact condition between the MIEC and the solid electrolyte capping 
layer is important, as this is where Li deposition is most likely to occur 
initially and where P, yea iS initiated. A root or coating of an electronic 
and Li-ion insulator (ELI) material like BeO, SrF, or AIN (with abandgap 
>4.0 eV, and thermodynamically stable against Li,..) might be used to 
bind the MIEC to the solid electrolyte. A mechanically compliant solid 
electrolyte, for example polyethylene oxide (PEO), could be used to 
prevent the brittle root-fracture problem. 

In the following experiments, we use lithiated carbon tubules 
~100 nm wideas the MIEC material. We demonstrate plating/stripping 
of Li or Na inside individual carbon tubules in an in situ transmission 
electron microscope (TEM) experiment, where a PEO-based polymer 
about 50 um thick was used as the solid electrolyte. The opposite side 
of the solid electrolyte was coated with a Li counter-electrode con- 
nected to the scanning tunnelling microscope (STM)/TEM manipula- 
tor. The TEM copper grid (Fig. 2a and Supplementary Fig. 1) serves as 
the current collector attached to the carbon tubules onthe other end. 
The carbon tubule has an inner diameter Wof around 100 nm, and its 
walls of width w= 20 nm are also nanoporous, as shown in Fig. 2b and 
Supplementary Fig. 2a, b”. 

Figure 2b-d shows TEM images of the Li plating process ina single 
carbon tubule with ZnO, as a lithiophilic agent introduced by control- 
ling the synthetic process (see Methods and Supplementary Video 1). 
Figure 2e, fand Supplementary Video 2 showthe changes of selected- 
area electron diffraction (SAED) patterns when deposited Li passes 
through the original void. After the ring pattern of the carbon tubule 
and a period of changing SAED patterns, the SAED (Fig. 2f) stays stable 
and shows a strong texture: (110), ;,..-L tubule axis and (110), ;,,..// tubule 
axis (Supplementary Fig. 3 also demonstrates the single-crystal fea- 
ture). Moreover, a high-resolution TEM (HRTEM) video captures the 
first appearance of the fresh Licrystal, with a 0.248-nm lattice spacing 
measured between (110) crystal planes perpendicular to the wall 
(Fig. 2g-i and Supplementary Video 3). We decreased the electron 
beam current to 0.3 Acm “to maintain the HRTEM image of the Licrys- 
tal for several seconds. The Li can also be stripped along the tubule 
(Supplementary Fig. 4) by retracting the Li,,, tip. The tip can plate and 
strip a length of more than 6 pm along the carbon tubule, which was 
the largest unblocked length of carbon tubule we could find (Supple- 
mentary Figs. 5 and 6). It can even climb over partial obstructions 
inside the carbon tubule (Supplementary Fig. 7). We also discovered 
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Fig. 2| Lithium plating/stripping inside carbon tubules. a, In situ TEM set-up: 
see Methods for details. b-d, TEM imaging of Li plating with fronts marked by 
white arrows (Supplementary Video 1) at increasing time. e, f, SAED changes 
frometof during Li plating (Supplementary Video 2). g-i, HRTEM imaging of a 
tubule before plating (g; boxed region shown magnified inh) and after plating 
(i, showing first formation of aLicrystal (Supplementary Video 3)). The red 
arrow indicates the Li atomic transport direction in deposition, pointing from 
the solid-electrolyte to the current collector. j-I, TEM imaging of Li stripping 
witha void plug between Liand solid electrolyte, with the Li atomic transport 
direction indicated by the yellow arrow (from current-collector side to the 
solid-electrolyte side), and the surface extent indicated by white arrows 
(Supplementary Video 4). Scale bars: b-d, g,j-I, 100 nm;h, i, 2nm; and 
e,f,5nm". 


reversible Li plating/stripping in aligned double carbon tubules 
(Supplementary Fig. 8 and Supplementary Video 5). The lithium filling 
ratio inside the tubule was estimated by electron energy-loss spectros- 
copy (EELS) thickness measurement. The Li K-edge of EELS (Supple- 
mentary Fig. 8c, d) is observed after Li plating at the location shown 
by ared cross in Supplementary Fig. 8a’**. The diameter of the plated 
Li... plug is estimated to be 92 nm, which can be compared to the inner 
diameter of the tubule, about 100 nm. Li,.. can also be plated in three 
aligned tubules simultaneously (Supplementary Fig. 9). 

We have tested the cycling stability of the carbonaceous MIEC 
tubules by in situ TEM, and found they can maintain excellent 
structural integrity even after 100 cycles of Li plating and stripping 
(Supplementary Video 6, Supplementary Figs. 10, 11). Licanalso plate/ 
strip inside tubules of different sizes, and even within tubules filled 
with 3D obstacles (Supplementary Figs. 7, 12-14). Our observations 
indicate that the internal Li shape change is not displacive/convec- 
tive, but rather a diffusive plating/stripping process onto a front or 
fronts, whichis much more tolerant of internal obstructions or obsta- 
cles. Similar results for plating/stripping sodium metal are shown in 
Supplementary Figs. 15, 16 and Supplementary Videos 7, 8. 


When stripping Li (Supplementary Video 4, Fig. 2j-l and Supplemen- 
tary Fig. 17), we can sometimes create a void plug that grows between 
the residual Li and the solid electrolyte. Yet this gap does not prevent 
the Li from being further stripped, growing the void that separates 
the solid electrolyte from residual lithium. Lithium must therefore be 
extracted from the wall or surface of the MIEC. This excludes disloca- 
tion power-law creep as a major kinetic mechanism, since dislocation 
slip cannot occur in the void, and the residual Li shows little mechani- 
cal translation (convection) in our experiments, although slight local 
sliding cannot be excluded. Therefore, dislocation creep is not the 
dominant creep mechanism. 

To determine the dominant mechanism of Li plating/stripping 
in MIEC tubules —either interfacial-diffusional Coble creep or bulk 
diffusional Nabarro—Herring creep—we carried out theoretical calcula- 
tions (see Methods section ‘Quantitative analysis’). We considered three 
possible paths for Li diffusion: (a) viaan MIEC wall of width w (~10 nm); 
(b) via the interface between an MIEC wall and Li,,,, with an atomic 
width Of 6interrace (-2 A); and (c) via bulk Li,,. of width W (-100 nm). We 
also considered three canonical MIECs—LiCg, Li,,Si; and Li,Al,. Gener- 
ally, for the cases when an MIEC is thermodynamically stable against 
Li,.., the calculations show that Li diffusion via the interfacial path 
(b) dominates. This means that the MIEC tubule concept is feasible 
for Li,Al, and Li,,Si;—or any other electrochemically stable MIEC (for 
example, CuLi,, Ni, W) that forms an incoherent interface with Li,,... 
Inall such cases, the diffusion flux along the 2-A incoherent interface 
between the MIEC and the metal, or over the MIEC surface, dominates 
over flux through the 10-nm MIEC wall itself. In other words, ion trans- 
port along the MIEC is dominated by the 2-A ‘interfacial MIEC channel’, 
as illustrated in Fig. 1. This greatly widens the range of material choices 
available for the MIEC, as we can now separate its mechanical function 
from its electron/ion-transport functions. 

Because carbon needs to be lithiated to LiC, to become a true MIEC, 
we introduced ZnO, during synthesis of the carbon tubules to improve 
their lithiophilicity, which greatly helps the achievement of uniform- 
quality MIEC tubules on the first lithiation. We now consider the mecha- 
nism of ZnO,-induced lithiophilicity”’™. On first lithiation, ZnO,in the 
MIEC undergoes a conversion/alloying reaction to produce Li,O, as 
follows*: ZnO, + (2x + y)Li=ZnLi, +xLi,O. But it is experimentally dif- 
ficult to obtain TEM images of the post-formation Li,O directly: the 
material is only a few nanometres thick, and located on the inner sur- 
faces of the carbon tubules. We used an alternative method to observe 
the in situ formed Li,O, namely imaging the outer surface of the carbon 
tubules, taking advantage of the homogeneous distribution of ZnO, 
across the carbon tubule wall (Supplementary Fig. 2). During Liplating, 
acrystalline Li,O layer with a thickness of a few nanometres is observed 
to be formed along the outer surface of the carbon tubule (Fig. 3a and 
Supplementary Fig. 18) like a lubricant. Li,O seems to be mechani- 
cally soft, despite its crystallinity, and can also deform by diffusional 
creep, even at room temperature”. If we continue deposition after 
the interior of the carbon tubule is fully filled with Li, at some point 
the Li will appear outside the carbon tubule. As shown in Fig. 3b-f and 
Supplementary Video 9 using dark-field imaging, we observed that 
after plating through the nanopores, Li first produces a complete wet- 
ting, rapidly spreading along the outer surface with zero contact angle 
up to a distance of 140 nm, before finally pushing downward”. This 
suggests that the ZnO,/Li,O layer on the MIEC surface helps to induce 
a strong lithiophilicity. 

Finally, to demonstrate a centimetre-scale all-solid-state full-cell 
battery, we constructed an MIEC tubular matrix using about 10”° 
cylinders, each with an aspect ratio of several hundred, capped by solid 
electrolyte (Fig. 4). The counter-electrode is LiFePO,. To fabricate the 
tubular MIEC matrix, we first used chemical vapour deposition (CVD) 
to growa layer of carbon onthe inner surface of free-standing anodic 
aluminium oxide (AAO) that acted as a template. Next, alayer of Pt was 
deposited on the bottom of the AAO by sputtering, to act as the current 
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Fig. 3| Lithiophilicity from ZnO,. a, HRTEM image of a layer of Li,O onthe 
outer surface of acarbon tubule. The inset expands the blue-box region, where 
lattice fringes of Li,O (111) are seen. b-f, Snapshots of dark-field imaging of Li, 
wetting the tubule outer surface as a function of time, with b showing the Li,,, 
already plated inside the carbon tubule, cto e showing facile wetting on the 
outside the with spreading distance labelled by yellow arrows, and f showing 


collector and mechanical support. Then, the AAO was etched away to 
yield the carbonaceous MIEC tubular matrix”, as shown in Fig. 4a-d. To 
enhance the lithiophilicity of the carbonaceous MIEC tubular matrix, 
a1-nm-thick ZnO layer was deposited onto the surface of the carbon 
cylinders by atomic layer deposition (Supplementary Fig. 19). This 
construction, several centimetres in extent and 50 um thick, sits on 
the Pt current collector (Supplementary Fig. 20). Indentation tests*° 
show a hardness of about 65 MPa (Supplementary Fig. 21), whichis 
higher than the internal pressurization limit (see Methods section 
‘Quantitative analysis’). We then cap the MIEC tubular matrix by a film of 
PEO-based/LiTFSI solid electrolyte, 50 pm thick. Alayer of LiPON about 
200 nm thick was pre-deposited into the carbon tubules by sputtering 
to obstruct the open pores (Supplementary Fig. 22). LiPON hasamuch 


final pushing downward (‘Outgrowth’). For the dark-field imaging, the (110) 
diffraction beam of the Licrystal shown in bis allowed to pass through the 
objective aperture, and the red dashed circle denotes the selected-area 
aperture also shown in the inset of b. See Supplementary Video 9. Scale bars: 
a,2nm;b-f,100 nm. 


poorer ionic conductivity than PEO-based/LiTFSI solid electrolyte, and 
approximates as the electron- and Li-ion insulator (ELI) roots shown 
in yellow in Fig. 1 that affix MIECs to the solid electrolyte. It also pre- 
vents inflow of the polymeric solid electrolyte into the MIEC tubules 
during testing at 55 °C. The cathode was constructed from the active 
material LiFePO, (60 wt%), polyethylene oxide (PEO, 20 wt%), LiTFSI 
(10 wt%) and carbon black (10 wt%). The mass loading is 4-6 mg LiFePO, 
per cm’ in full cells. In half cells, we use a superabundant Li metal chip 
(more than 100x excess) as the opposite electrode. No (ionic) liquid 
or gel electrolyte of any kind was used in our centimetre-scale battery 
experiments. For making the full cell with a small lithium inventory 
compared to the cathode capacity, we first pre-deposited 1x excess Li 
into the MIEC tubules electrochemically from the half cell. 


60 


Capacity (mA h g“*) 


Fig. 4 | Electrochemical performance of scaled-up Li metal cell with about 
10° MIEC cylinders. a-d, Field emission SEM (FESEM; a-c) and TEM (d) images 
of the carbonaceous MIEC tubules. e, f, Charge/discharge profiles at 0.1C (e) 
and cycling life (f) of the all-solid-state (1x excess) Li,.-pre-deposited MIEC/SE/ 
LiFePO, batteries. The magenta (capacity) and blue (coulombic efficiency) 
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Cycling number 


colours indicate the use of 3D MIEC tubules on Pt foil as a Li host, the discharge 
capacity of which reaches 164 mAhg ‘at 0.1C and 157 mAhg ‘at 0.2C, while the 
green colour indicates the use of 2D carbon-coated Cu foil as a Lihost. Scale 
bars:a,1pm;c,500nm;andb,d,200nm. 


Inhalf-cell tests, we could cycle a large amount of Li with a substantial 
areal capacity up to 1.5 mA hcm~. Compared with the control experi- 
ment using carbon-coated Cu foil as the Li host, the half cell with the 
3D MIEC tubular matrix shows a lower overpotential (39 mV versus 
250 mV at 0.125 mA cm”) anda much higher coulombic efficiency 
(97.12% versus 74.34% at 0.125 mA cm”), as well as much better cycling 
stability (Supplementary Figs. 23, 24). More importantly, in full-cell 
tests, with only 1x excess Li pre-deposited inside the MIEC tubules, 
the all-solid-state full cell shows a lower overpotential (0.25 V versus 
0.45 V), a higher discharge capacity (164 mA hg“ versus 123 mAhg”) 
and a much higher coulombic efficiency (99.83% versus 82.22%) at 
0.1C (Fig. 4e). This full cell shows almost no degradation for more 
than 50 cycles (Fig. 4f), and the gravimetric capacity of our Li/MIEC 
composite anode reaches a remarkable value of about 900 mAhg™. 
This validates the MIEC architecture for an all-solid-state alkali metal 
battery, which has been taken from mechanistic concepts to quantita- 
tive theory and design to the realm of practice. 
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Methods 


Synthesis of carbon hollow tubules 

The synthesis of the carbon tubules was similar to that used in our 
previous work”. 1g of polyacrylonitrile (PAN, Aldrich) and 1.89 g of 
Zn(Ac),:2H,O were dissolved in 30 ml of dimethylformamide (DMF, 
Aldrich) solvent to obtain the electrospinning solution. A working 
voltage of 17 kV, a flow rate of 0.05 mm min“, and an electrospinning 
distance of 20 cm were used to synthesize the PAN/Zn(Ac), composite 
fibres. A layer of zeolitic imidazolate framework (ZIF-8) can be formed 
on the surface of the composite fibres by adding them to an ethanol 
solution of 2-methylimidazole (0.65 g, Aldrich). The introduction ofa 
trace amount of cobalt acetate into the composite fibres can promote 
the graphitization of the carbon tubules. The synthesized core-shell 
composite fibres were heated at 600-700 °C for 12 h to obtain the 
hollow carbon tubules with some ZnO,.. 


Insitu transmission electron microscopy 

This was conducted using a JEOL 2010F TEM at 200 kV with a Nano- 
factory STM/TEM holder*. The solid-state nanobattery contains Li 
metal, solid electrolyte and the prepared carbon tubules with ZnO... 
The Li metal was applied to a tungsten probe ina glove box filled with 
Ar gas, and the prepared carbon tubules were adhered to half a TEM 
copper grid by silver conductive epoxy. For a typical example ofa soft 
solid electrolyte, sufficient poly(ethylene oxide) (PEO) and lithium 
bis(trifluoromethanesulfonyl)imide (LiTFSI) were dissolved in 1-butyl- 
1-methylpyrrolidinium bis(trifluoromethylsulfony!)imide (ionic liquid). 
The Li metal on the tungsten probe was capped by the obtained solid 
electrolyte with a thickness of ~50 um inside the glove box filled with 
Ar gas. After loading the battery components into the TEM, the end 
with solid electrolyte covering the Li metal on the tungsten probe was 
manipulated to get acontact with the carbon tubules on the TEM cop- 
per grid to complete the assembly of a nanobattery. Lithium plating 
and stripping inthe carbon tubules were realized by applying —2 V and 
+2 V with respect to the lithium metal. 


Electron radiation damage control 

In the in situ TEM experiments, we reduced electron beam damage as 
muchas possible. Li metal is sensitive to electron beam irradiation in the 
TEM, owing to elastic and inelastic scattering”. The elastic (electron- 
nucleus) scattering can lead to sputtering damage, and the inelastic 
(electron-electron) scattering can cause damage by specimen heating 
and radiolysis’. 

In our low-magnification TEM images and videos showing Li plat- 
ing and stripping inside the tubule, a low electron beam current of 
around 1.5 mA cm” was used to minimize beam damage. The images 
were taken at a slightly underfocused condition to enhance the con- 
trast. We blanked the beam before recording the video, and limited 
experiment recording time to less than 2 min. The beam was blanked 
for most of the time while plating and stripping Li, except for some 
necessary observations. For taking the SAED patterns of Li plated inside 
the tubule, a broad electron beam witha lowelectron beam current of 
1mA cm’ was used. We took the patterns as quickly as possible to lower 
the amount of irradiation damage. For taking the HRTEM image of Li 
plated inside the tubule, an electron beam current of around 0.3 Acm” 
was used. The HRTEM image captured the fresh Li crystal when it first 
appeared inside the camera field, showing the lattice fringes of (110),,. 
planes. The Lilattice fringes remained for several seconds before van- 
ishing owing to electron beam irradiation damage. 

The carbon tubules help to reduce irradiation damage when imag- 
ing Li,., inside the tubules (Supplementary Figs. 25-27). As the Li,,.. is 
inside the wall of the tubule, this helps to reduce the sputtering loss. 
Inthe case of inelastic scattering, the tubule may also act as athermal/ 
electron conductor covering the lithium metal, helping to release some 
heat by electron irradiation. Furthermore, the electrochemical plating 


can continually replenish fresh Li,,, in the region under irradiation, 
which may also help the HRTEM imaging. 

We have also carried out in situ TEM experiments with the electron 
beam blanked. We first set up the nanobattery inside the TEM, plac- 
ing the selected-area aperture on the still-hollow carbon tubule, and 
turned on the SAED mode in advance. During these steps, we did not 
apply any bias potential (the bias potential is required to deposit Li 
metal inside the carbon tubule). After this preparation, we turned off 
the electron beam (‘blind’ condition). With no electron beam present, 
we applied bias potential for some time to deposit Li metal inside the 
tubule. We then turned on the electron beam (at alowccurrent density 
of1mAcm7”), andimmediately the sharp SAED pattern appeared onthe 
TEM CCD window. The single crystal feature was later identified as the 
Li,.- phase from its measured lattice constant (Supplementary Fig. 28). 


Electron energy-loss spectroscopy (EELS) 

The EELS spectra were taken in the STEM mode with a spot size of 
Inm, with a semi-convergence angle of about 5 mrad and a semi- 
collection angle of about 10 mrad. For the thickness calculation in 
Supplementary Fig. 8, the absolute log-ratio method was used**, where 
; =In ja) stands for thickness, A stands for effective mean free path, 
/,isthe intensity integration under the whole EELS spectrum, and J is 
the intensity integration under the zero loss peak). In addition to the 
accelerating voltage, semi-convergence and semi-collection angles, 
to calculate A the effective atomic number Z,,, was also needed. We 
estimate Z.,= 6 for the carbon tubule before Li plating. After Li plat- 
ing, both Li and the wall of the carbon tubule existed at the location 
where we recorded the EELS signal. In this case, we can estimate the 
rough atomic ratio between Li and C to be 0.56:1 when considering the 
observed geometry of the tubule, with an inner diameter of ~-100 nm 
anda wall thickness of ~28 nm. Using the formula 
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we obtain Z.,= 5.1 (after Li plating). 

The thicknesses before and after Li plating were thus calculated to 
be ~-68 nm and -160 nm respectively from the EELS spectra recordedin 
Supplementary Fig. 8, and the thickness difference (corresponding to 
the thickness of Li plated) was estimated to be ~92 nm. The background 
contribution under the edge can be estimated from the pre-edge area, 
and the K-edge of Li was obtained by background subtraction. 


Other characterizations 

The synthesized materials were characterized by TEM, high-resolution 
TEM (HRTEM), field emission scanning electron microscopy (FESEM, 
FEI Helios 600 Dual Beam FIB), energy-dispersive X-ray spectroscopy 
(EDX, Oxford) and X-ray photoelectron spectroscopy (XPS, PHI5600). 


Synthesis of MIEC 3D electrode 

First, the chemical vapour deposition (CVD) method was applied, using 
90 sccm C,H, at 640 °C, which grows a layer of carbon onto the inner 
surface of anodic aluminium oxide (AAO) that acted as the template. 
Next, a layer of Pt was deposited by sputtering on the bottom of the 
AAO; it acted as the current conductor and as mechanical support. 
Then, the AAO was etched to yield the carbonaceous MIEC tubular 
matrix by employing a3 M NaOH aqueous solution with a small amount 
of ethanol added. To enhance the lithiophilicity of the MIEC tubules, 
a1-nm-thick ZnO layer was deposited onto the inner surfaces of the 
MIEC tubular matrix by ALD (atomic layer deposition). 


Li/solid electrolyte/MIEC half cell 

To avoid the inflow of polymeric solid electrolyte into the MIEC tubules 
during testing at 55 °C, a layer of LiPON ~200 nm thick was deposited 
onto the MIEC tubules by sputtering to obstruct the open pores. A 


PEO-based/LiTFSI film was used as a typical solid electrolyte (from 
KISCO Ltd). The 2032 coin cells were prepared by pressing the MIEC 
tubular matrix onto one side of the solid electrolyte film and a Li metal 
chip onto the other side. The obtained Li/solid electrolyte/MIEC tubu- 
lar matrix half cell was tested at current densities of 0.125, 0.25 and 
0.5 mA cm”. The half cells were cycled a few times to stabilize the 
interface between solid electrolyte and electrode. The Coulombic 
efficiency was obtained from the ratio of discharge and charge capac- 
ity. For comparison, 2D carbon-coated Cu foil was used to prepare a Li/ 
solid electrolyte/carbon-coated Cu foil cell. 


All-solid-state full cell 

The LiFePO, cathode was constructed from LiFePO, powder (the active 
material, 60 wt%), polyethylene oxide (PEO, 20 wt%), LiTFSI (10 wt%), 
and carbon black (10 wt%). The mass loading is 4-6 mg LiFePO, per cm’. 
We predeposited 1x excess Li into the MIEC tubules from the half cell. 
The 2032 coin cells were prepared with Li-deposited MIEC tubules as 
the anode, the LiFePO, electrode as the cathode and solid electrolyte 
in an Ar-filled glove box. The all-solid-state battery was tested at 55 °C 
with aLAND battery tester between 2.5 V and 3.85 V. We also predepos- 
ited 1x excess Li onto the carbon-coated Cu foil of the control battery 
before the full cell testing. 


Quantitative analysis 

Internal gas pressure accommodation. For mechanical stability, itis 
in practice difficult to construct a vacuum-filled tubular matrix, so we 
will assume that initially we have an inert gas phase in the white region 
in Fig. lwith P,,,=1atm. The gas-tightness of the solid electrolyte layer 
must be guaranteed, because otherwise Li metal will easily plate or flow 
through the solid electrolyte, shorting to the cathode. Thus, when Li 
metal is deposited inside a tubule, the gas phase must be compressed. 
If the current collector (say Cu) and the MIEC walls are also hermeti- 
cally sealed, then local P,,,, will increase as more and more Li metal is 
deposited inside, up to possibly tens of atmospheres (a few MPa) if the 
compression ratio is something like 10x. The creeping Li metal can 
act as a piston, as we have seen from the Nernst equation that P, meta 
can’ easily reach hundreds of MPa. However, owing to unavoidable 
heterogeneities, the amount of Limetal deposited may not be the same 
between adjacent cylinders, and this will cause a pressure difference, 
AP,,;, between adjacent cylinders that can bend the MIEC wall. If the 
MIEC wall (red region in Fig. 1) is not mechanically ductile enough, 
then at acertain point a cell may burst. For this reason, it is better for 
the MIEC wall to be permeable, so P,,, can then equilibrate from cell to 
cell. Then the internal pressure will be more homogenously distributed, 
ensuring that the left chamber in Fig. 1 will not expand and crush the 
right chamber by bending the wall. 


Geometric design. While the in situ TEM experiments give us confi- 
dence that MIEC electrochemical cells work at the ‘single cylinder’ or 
‘few-cylinders’ level, transport and mechanical durability issues will 
determine how well the cell will work in practice at cm x cmscale, with 
a massive number (-10"°) of parallel cylinders. The typical areal capacity 
Qand current density / = dQ/dt demanded by industrial applications 
are of the order 3 mAhcm7’and3 mAcm*, respectively. Typical over- 
potentials U of lithium-metal-containing anodes (versus Li*/Li) are of 
the order of 50 mV. With unavoidable heterogeneities among the -10”° 
cylinders, transport/reaction limitations may vary from location to 
location. With P, mera in MPa and Uin V, we have maxP, jmeta= 7-410U, so 
for U=50 mV, maxP, jnetai = 370 MPa: the higher the overpotential, the 
larger maxP, meta, and the more severe the local mechanical degrada- 
tions can be. We cannot allow the overpotential U, a global quantity, 
to rise too high; but Uis still responsible for driving a global average 
current density/. This means the average transport conductance should 
be better than ~3 mA cm7/50 mV =0.06 S cm” as an order-of-mag- 
nitude estimate, otherwise the requisite pressure might be too high 


and the MIEC tubules may burst somewhere. The effective transport 
conductance of the tubular matrix is (Kyjec/h) x w/(w + W), where Kwiec 
(inS cm) is an effective Liconductivity, and w/(w+ W) is the fill factor 
by MIEC (assuming straight pores and tortuosity = 1). In order to get 
Q=3mAhcm*~, hneeds to beat least -20 um, taking into account the 
inert host volume (see Supplementary Fig. 29 for calculated capacity 
with the tubular matrix geometry). So we get an effective longitudinal 
transport requirement: 


KyecX W/(w + W) > 0.06 S cm? x 20 um = 0.12 mS cm! (1) 


For MIEC, we have bulk contribution 


ei, DEM Q) 
where ¢,;(in cm’) is the Li atom concentration, and D>" is the tracer 


diffusivity of Liatoms in bulk MIEC. We should recognize, however, 
that interfacial diffusion might be significant or even dominant with 
100-nm-sized MIEC cylinders, as there can be fast diffusion paths of 
width 6interrace (typically taken to be 2 A) at the MIEC/Li,,, incoherent 
phase boundary (red/grey interface in Fig. 1) or surface (red/white 
interface in Fig. 1), in which case we need to correct Kyyj,-c by the follow- 
ing size-dependent factor: 


— pbulk interface bulk 
Kynec= Kmiec * (1+ 2D ij Sinterface/Dti W) (3) 


With bulk diffusivity data culled from table 2 of ref.”°, we see that among 
the three canonical MIECs—LiC, (c,; = 1.65 x 10” cm”, optimistic 
peulk =~ 10°77 cm? s7), LixSi; (cy, = 5.3 x 107 cm, optimistic 
DPulk = 10" em? s7) and Li,Al, (c,; = 4 x 10 cm™, optimistic 
pPulk 10°? cm?s7)—it looks likely that LiC, has the largest c, DP". Put- 
ting the values into equation (2), «pul (LiC,) = 0.01S cm™. However, 
there is large uncertainty in the diffusivity data, soa more conservative 
estimate might be DbM"(LiC,) 10-8 cm’s™, xeMK(LiC,) =1mS cm". Thus, 
the minimum MIEC fill factor for LiC, is 


0.12mS cm! _ 


Kuiec 


Win! (Wmint W) = 0.1 


and so if W=100 nm, one should have minimally w = w,,, = 10 nm. 
This wall thickness happens to also make sense from a mechanical 
robustness requirement viewpoint. Coincidentally, this geometry 
is quite close to that of our carbon tubule experiment. The design 
above is consistent with the fact that graphite or hard carbon anodes 
used in lithium-ion batteries (LIB) havea film thickness of the order of 
100 pm, and the film is known to be able to support acurrent density of 
~3mAcm “with an overpotential of ~50 mV. Indeed, referencing to an 
industrial LIB graphite anode is apt here, because we know they work 
near the borderline as an anode in charging: if the current density is 
significantly higher than ~3 mA cm”, then the local potential would 
drop below O Vversus Li*/Li, and Li,.. would precipitate out, whichis a 
substantial problem for LIB cycle life and safety with liquid electrolyte. 
Here, we are proposing to turn the problem on its head. We want the 
Li metal to ‘spill out’ of the MIEC, but in a controlled fashion, inside 
the internal tubular cells within a reserved space capped by ELI and 
solid electrolyte, without excessive P, meta) build-up and cracking of 
the solid electrolyte, and without any fresh SEI production (since the 
expanding/shrinking parts are in contact with MIEC and will stop at ELI, 
which are both electrochemically absolutely stable against Li metal, so 
no side reactions are possible electrochemically). Then we only need 
to ensure mechanical integrity of this 3D solid structure of open-pore 
MIECs rooted in solid electrolyte via ELI. 

If there were no interfacial diffusion contribution, Li,Al, might be 
a borderline case, with Kyjec(LijAl,) = 0.25 mS cm from equation (2), 
thus requiring an excessively large MIEC fill factor of 
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_ 0.12mScm 


KmIEC 


Win! (Wmin t W) =0.5 


and requiring W,,;, = W= 100 nm. Such a large fill factor is unlikely to 
be competitive against a graphite anode. Lastly, the bulk Li diffusivity 
value (DPt"« = 10" cm? s”) for Li,,Si; is totally unworkable, because 
Kyec(Li,,Si;) = 0.003 mS cm“, and we cannot satisfy the transport 
requirement, equation (1). We conclude therefore that if bulk diffusion 
alone operates in the MIEC, DP" = 10-8 cm? s! would be workable, 
pbulk ~ 10° cm? s would be difficult, and anything lower would be 
impossible. 

Experimentally, when stripping Li metal (Fig. 2j-1, Supplementary 
Fig. 17 and Supplementary Video 4), we can sometimes create a void 
plug that grows between the residual Li metal and the solid electrolyte. 
Yet this gap does not prevent the Li metal from being further stripped 
in the experiment, growing the void that separates solid electrolyte 
from the residual lithium. Li metal must therefore flow out from the 
MIEC wall/surface. This then excludes dislocation power-law creep as 
amajor kinetic mechanism, since dislocation slip cannot occur in the 
void, and the residual Li metal shows very little mechanical translation 
(convection) in our experiments, although slight local sliding cannot 
be excluded. Based on our in situ TEM observations, therefore, the Li 
metal must be in the Coble creep regime. However, this does not deter- 
mine whether the Liis transported along the MIEC interior of widthw, 
or along the MIEC/Li metal interface (Supplementary Fig. 4 case) or 
over the MIEC surface (Fig. 2j-I case) of width 6;,rerfaces and then plated 
to the tip of the Li metal via Li metal surface diffusion, as illustrated in 
Fig. 1. A theoretical bound is necessary. According to NMR measure- 
ments”, bulk b.c.c. Li metal has DP!" = 4. x 10" cm? sat room tem- 
perature, which we know from the calculations above is two orders of 
magnitude too sluggish to support the observed Li metal kinetics. For 
surface diffusivity of Lion b.c.c. Li metal, the empirical formula*® 


Dis"*e— 0.014 exp(- 6.547y/T) cm? s 4) 


has been verified to work quite well for monatomic metals. For instance, 
Sn, another low-melting-point metal (7, = 232 °C), was found to have 
surface diffusivity D&""*°¢=1 x 107 cm’s at room temperature by direct 
mechanical creep deformation experiments”, while equation (4) pre- 
dicts 2 x 10’ cm’ s7. Equation (4) predicts péurface =7x10’cm’s?tin 
b.c.c. Li at room temperature. This is 70x larger than that of 
pbulk 10-8 cm’s in LiC,. The geometry factor 26 inrerface/W, on the other 
hand, is of the order of 4 A/10 nm = 1/25. Thus, if one takes an optimis- 
tic estimate that Dinterface ~ psurface then the interfacial diffusion con- 
tribution can be 3x that of the bulk MIEC diffusional contribution even 
for LiC,. The MIEC/Li metal phase boundary has a lower atomic free 
volume than the free Li metal surface, so we expect Din"*"°* could be 
a factor of a few smaller than Ds" = 7 x 10” cm?s. Experimental dif- 
fusivity data for metals*® suggest that Dinter ~ 2 x 107 cm? s“, Thus, 
pittertace will definitely dominate over bulk MIEC diffusion for Li,Al, 


and Li,,Si,, as the ratio Dine" /p>“k (200 for LigAl, and 20,000 for 


Li,,Si,) easily overwhelms the geometric factor 26 ynrerface/W (1/25 for 
w=10nm). The bulk MIEC contribution can thus be ignored for the 
electrochemical design, and regardless of MIEC choices we predict an 
effective Kyec~1mS cm, which would satisfy the longitudinal transport 
requirement, equation (1), for an MIEC fill factor of w/(w + W) =0.1. 
This predicts that the MIEC tubule concept actually becomes feasible 
even for Li,Al, and Li,.Si; or any other electrochemically stable 
MIEC, since diffusion flux along the Ointerfce* 2 A MIEC/metal incoherent 
interface or the MIEC surface dominates over the 10 nm MIEC itself. 
This recognition greatly liberates the MIEC material selection 
choices, as we can now separate its mechanical function from its ion- 
transport function. In other words, ion transport along the MIEC is 
dominated by an ‘interfacial MIEC channel’ along 6, nrerface. aS illustrated 
in Fig. 1. 
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Prussian blue analogues (PBAs) are a diverse family of microporous inorganic solids, 


known for their gas storage ability’, metal-ion immobilization’, proton conduction’, 
and stimuli-dependent magnetic*”, electronic’ and optical’ properties. This family of 
materials includes the double-metal cyanide catalysts®’ and the hexacyanoferrate/ 
hexacyanomanganate battery materials!°". Central to the various physical properties 
of PBAs is their ability to reversibly transport mass, a process enabled by structural 


vacancies. Conventionally presumed to be random 


12.13 vacancy arrangements are 


crucial because they control micropore-network characteristics, and hence the 
diffusivity and adsorption profiles’. The long-standing obstacle to characterizing 
the vacancy networks of PBAs is the inaccessibility of single crystals’®. Here we report 
the growth of single crystals of various PBAs and the measurement and interpretation 
of their X-ray diffuse scattering patterns. We identify a diversity of non-random 
vacancy arrangements that is hidden from conventional crystallographic powder 
analysis. Moreover, we explain this unexpected phase complexity in terms of asimple 
microscopic model that is based on local rules of electroneutrality and 
centrosymmetry. The hidden phase boundaries that emerge demarcate vacancy- 
network polymorphs with very different micropore characteristics. Our results 
establish a foundation for correlated defect engineering in PBAs as a means of 
controlling storage capacity, anisotropy and transport efficiency. 


The true crystal structures of PBAs—and of Prussian blue itself—have 
long posed a difficult and important problem in solid-state chemistry 
because their ostensibly simple powder diffraction patterns (Fig. 1a) 
belie a remarkable complexity at the atomic scale””. The common 
parent structure is based on the cubic lattice and corresponds to the 
idealized composition M[M’(CN),]. Atoms of type M and M’ (usually 
transition-metal cations) occupy alternate lattice vertices and are octa- 
hedrally coordinated by bridging cyanide ions (CN ) at the lattice edges 
(Fig. 1b, left). There is a close parallel to the double perovskite structure; 
indeed the considerations of covalency and octahedral coordination 
that stabilize perovskites among oxide ceramics also favour this same 
architecture for transition-metal cyanides, accounting for the chemical 
diversity of PBAs”°. Charge balance requires that the formal oxidation 
states of Mand M’ sum to six, as in Cd"[Pd'(CN),] (ref. ”"). 

Prussian blue itself is a mixed-valence cyanide of iron in its 2+ 
and 3+ oxidation states”””’, and so its composition cannot respect 
this oxidation-state-sum rule. Instead the rule is circumvented by 
vacancies: the composition is well approximated by the formula 
Fe" [Fe"(CN).]34Jy4XH,0, where the symbol represents a[Fe"(CN),]*" 
vacancy’®. Vacancies are usually filled by water molecules, which com- 
plete the coordination sphere of the neighbouring M cations”; we use 


the term ‘vacancy’ to encompass the possible occupancy of the M’ site 
with water. Each vacancy gives rise to a micropore with an effective 
diameter of approximately 8.5 A that exceeds the distance between 
neighbouring M’ sites (a/V2 = 7.2 A)**. Hence a pair of neighbouring 
vacancies, if present, connects to forma larger micropore’. Arandom 
vacancy distribution would imply bulk micoroporosity, because the 
vacancy fraction exceeds the percolation threshold for the face-cen- 
tred cubic (fcc) M’ sublattice (about 0.20)”. But Prussian blue is not 
microporous: single-crystal X-ray diffraction has shown that vacancies 
tend to avoid one another by adopting a specific ordered arrangement 
(Fig. 1b, centre)'®. A vacancy fraction of 4 is the greatest that can sup- 
port complete vacancy isolation. 

PBAs witha nominal composition of M"[M”"(CN).]>1_1y3"XH,0 (here- 
after M[M’]) contain an even higher fraction of M’-site vacancies””°”®, 
Hence geometry dictates that these vacancies—whatever their distri- 
bution—must form connected neighbour pairs (Fig. 1b, right). The 
existence and nature of any extended micropore network that then 
develops depends on longer-range vacancy correlations. The collec- 
tive micropore structure of PBAs is remarkably poorly understood, 
despite the relevance of mass transport to the many important prop- 
erties of the family’”. We do know the following: adsorption isotherm 
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M"'[M’“(CN),], 


Intensity (a.u.) 


Fig. 1| Structure of PBAs. a, PXRD pattern of Mn[Co], typical for PBAs. b, The 
parent structure type (left) comprises interpenetrating fcc arrays of Mand 

M’ cations (pink and blue spheres, respectively), bridged by cyanide ions (black 
rods). In Prussian blue (centre), one-quarter of the M’ sites are vacant, creating 
isolated micropores (green spheres) that are usually occupied by water. In 


measurements reflect a diversity of pore characteristics across PBAS”*”’; 
solid-state "*Cd NMR measurements have evidenced non-statistical 
vacancy distributions in Cd[Fe,Co,_,] (ref. 7%); weak primitive superlat- 
tice reflections have been observed only sometimes in powder X-ray dif- 
fraction (PXRD) patterns—their presence has usually been interpreted 
as evidence for (partial) Prussian-blue-type vacancy order’®; high-reso- 
lution transmission electron microscopy has revealed vacancy chains 
insome copper-containing PBAs and their absence in zinc-containing 
samples”; and in the only existing single-crystal diffraction study ofa 
PBA (namely Mn[Mn]), structured diffuse scattering was observed and 
interpreted in terms of Warren-Cowley correlation parameters?°*), 
Taken together, these observations suggest that vacancy distributions 
are unlikely to be random, and that there must be substantial variability 
in the pore networks of different PBAs. 

In this study, we have characterized vacancy correlations in a range 
of PBAs by growing single crystals, measuring their X-ray diffuse scat- 
tering patterns, and interpreting these patterns via a three-dimensional 
difference pair distribution function (3D-APDF) analysis and Monte 
Carlo simulations. 

For every crystal we investigated, the corresponding X-ray diffraction 
pattern contained weak but highly structured diffuse scattering, which 
is the hallmark of strongly correlated disorder”. Representative (hkO) 
cuts of our diffuse scattering patterns are shown for aselection of PBAS 
in Fig. 2, where we also include the single-crystal diffuse scattering 
pattern of Mn[Mn]-the only other such pattern ever reported for a 
PBA*°*", The inverse Fourier transform of the normalized diffuse scat- 
tering function yields the 3D-APDF™. The form ofall of our 3D-APDFs is 


Fig.2| Single-crystal diffuse scattering from PBAs. Reconstructed (hkO) 
scattering planes are shown here for eight PBA samples (—6 <|Al, |k|< 6). The 
data for Mn[Mn] are those reported in ref.*. At the bottom-right corner of each 
panel is the diffuse scattering pattern averaged over all squares with 5h, 6k=2 
in the (AKO) scattering plane. Intensities near the Bragg positions with evenh, k 
inthe corners of the squares have been removed. Note the fundamental 
difference in information content of these single-crystal data relative to PXRD 
traces of the same materials; compare with Fig. la. 


M"[M(CN) 4/4 M"[M""(CN) logs 


PBAs (right), one-third of the M’ sites are vacant. There are nowsufficiently 
many vacancies that neighbouring pores must connect (dark green collars) to 
give an extended micropore network. The characteristics of this network 
depend onvacancy correlations. 


well described by aconvolution of the contribution from an individual 
[M”""(CN),]* anion together with an occupational correlation function. 
Hence the diffuse scattering we observe arises from correlations in 
[M’""(CN),]* occupancy instead of from any alkali cation or solvent 
inclusion. The scattering is also predominantly elastic, because the 
3D-APDF is dominated by occupational correlations and not the sig- 
nature of cooperative displacements. In PXRD measurements, orienta- 
tional averaging conceals the diffuse scattering within the background 
or causes it to resemble primitive superlattice reflections”; it is in this 
sense that the vacancy correlations from which the diffuse scattering 
arises are ‘hidden’. 

We find a surprising diversity of diffuse scattering patterns among 
the studied PBAs. This is true even for crystals with the same nominal 
composition but grown separately (the example in Fig. 2 is a pair of 
Mn[Co] crystals grown in different media). So our experimental data 
unambiguously show that the vacancies in PBAs are distributed ina 
highly non-random manner, and that these distributions can be fun- 
damentally different for different samples. 

It remains to show howwe might understand this diversity, and what 
theimplications are for mass transport in PBAs. To doso, we have devel- 
oped avery simple vacancy interaction model that is nevertheless 
capable of rationalizing the various experimental diffuse scattering 
patterns. Monte Carlo simulations driven by this set of interactions 
generate representative pore-network configurations for each phase 
that can then be used to determine physical properties of relevance to 
mass transport and storage. Our model contains just two components, 
each based on simple crystal-chemical considerations. The first favours 
a uniform vacancy distribution, such that for each M site four of the 
six neighbouring M’ sites are occupied and two are vacant. This con- 
tribution reflects Pauling’s ‘electroneutrality’ principle**. The second 
component favours locally centrosymmetric arrangements, which we 
expect to have greater or lesser importance depending on the M-site 
chemistry. 

Formally, we represent the Monte Carlo energy by 
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where the sumis taken over all M sites at positions r, with the neighbour- 
ing M’-site states e,,,.= 0 (vacant) or 1 (present), and/J,,/, > 0 quantify- 
ing the strength of the electroneutrality and centrosymmetry terms, 
respectively. The occupancy fraction (e) = 2/3. The quadratic form of 
the electroneutrality component comes from the leading term in the 
series expansion in local charge at the M site. We performed Monte 
Carlo simulations for a range of /’ =/,//, ratios and effective tempera- 
tures 7’ = T/J,. Our results are shown in Fig. 3a, represented in terms of 
the single-crystal X-ray diffuse scattering patterns calculated froman 
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Fig. 3| Vacancy network phase diagram. a, Monte Carlo diffuse scattering 
map with experimental plane-averaged scattering superimposed (squares). 

b, Distribution of PBAs (left) and vacancy polymorphs (I-VI) demarcated by 
Monte Carlo specific-heat anomalies (black circles) anda morphotropic phase 
boundary (red line)*". Lines are guides to the eye. c, Centrosymmetric and 
pseudotetrahedral M-site geometries. d, Thermodynamic and micropore 


ensemble of 40 Monte Carlo configurations for each point across an 
evenly distributed mesh of/’, log(7’). 

The phase behaviour given by this simple Monte Carlo model is 
remarkable for a number of reasons. Clearly the form of the diffuse 
scattering—and, as we will come to see, of the vacancy-network topol- 
ogy—is an extremely sensitive function of/’ and 7’. This observation 
mirrors our experimental results: namely, that small variations in 
synthesis conditions or PBA composition strongly affect the diffuse 
scattering. Such sensitivity arises because the two interaction terms 
of electroneutrality and centrosymmetry operate in tension: they are 
resolvable when the vacancy fraction is 4 (giving the ordered Prussian 
blue vacancy arrangement shown in Fig. 1b; compare with sample in 
ref.'°), but become frustrated as the vacancy fraction increases. Hence 
the crystal-chemical considerations embedded in equation (1) drive 
an unexpectedly complex configurational landscape for PBAs with 
an M’-site vacancy fraction of 1/3. We note the parallel to geometric 
frustration in relaxor ferroelectrics (for example, Pb(Mg,/;Nb,/3)O3) 
and relaxor ferromagnets (for example, La(Sb,/3Ni,/;)O3), where the 
problem of 1:2 decoration of the fcc lattice is also central®>**. 

The experimental diffuse scattering patterns given in Fig. 2 are well 
approximated by our Monte Carlo simulations at different values of 
J and T’ (Fig. 3a, b). The implication is that electroneutrality and cen- 
trosymmetry are alone sufficient to account for the basic form and diver- 
sity of the diffuse scattering patterns that are observed experimentally. 
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network characteristics: normalized Monte Carlo energy £’; Monte Carlo 
energy gradient log(AF/AT); anisotropy o= > (!-7)?, where/and/ are the diffuse 
scattering intensities before and after Laue symmetrisation; scattering 
localization L = log[Z(/’)/(Z/)’]; surface-accessible vacancy fraction X4¢¢} 
conductance C; vacancy-neighbour pairs per formula unit p; and tortuosity T. 


But what determines/’ and 7’ for a given system? PBAs with Jahn- 
Teller-active M-site cations (suchas Cu[Co]) correspond to smaller val- 
ues of /’, whichis sensible because crystal field effects”° must increase 
the relative importance of the/, term. By contrast, crystal-field-inactive 
M-site cations correspond to larger/’; the larger values for Zn[Co] and 
Cd[Co] probably reflect the empirical propensity of Zn and Cdto adopt 
acentric coordination geometries in their pseudobinary cyanides*”*® 
and rhombohedral PBAs” (Fig. 3c). So PBA composition controls/’, 
with M-site chemistry more important than that of the M’ site. Solid 
solutions will probably span the range of /’ values bounded by the cor- 
responding endmembers, which in the case of Cu/Zn mixtures renders 
most of /’ space accessible synthetically. The effective Monte Carlo 
temperature 7’ appears not to be driven by composition but reflects 
instead the precursor concentration and crystal growth rate (high 
T =rapid precipitation and/or high oversaturation). Our Mn[Co] and 
Mn[Co]J’ samples are associated with similar/’ but different 7’, with 
the lower 7’ value for the slower-grown sample (by gel diffusion). By 
reducing the synthesis temperature and/or the precursor concentra- 
tions, it may prove possible to access log7’ values lower than those we 
report here. So, froma synthetic viewpoint, there is genuine scope for 
navigating much of the/’ and 7’ space through judicious choice of the 
PBA chemistry (/’) and synthesis approach (7’). 

Just as the calculated diffuse scattering patterns are unexpect- 
edly diverse for our Monte Carlo configurations, so too are the 
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Fig. 4 | Statistical properties of micropore networks. Network coordination 
number and geometry distributions are given as interior and exterior pie 
charts, respectively, for representative configurations drawn across the phase 
diagram shown in Fig. 3. For each coordination number 22, coordination 
geometries related to square-planar networks are shaded pink; those related to 


corresponding vacancy-network structures. Despite their consider- 
able disorder, these networks have meaningfully different physical 
characteristics that we discuss in greater detail below. At the simplest 
level, different configurations have vacancy networks with different 
coordination number and geometry distributions (Fig. 4 and Extended 
Data Fig. 1). Low values of /’ give networks dominated by square-pla- 
nar nodes; at large/’, we find low-dimensional motifs based on 120° 
zig-zag chains instead. High 7’ favours a greater diversity of network 
geometries and low effective temperatures stabilize uniform vacancy 
networks and/or phase segregation. 

Collectively, the different scattering patterns and micropore geom- 
etries identify previously unknown phase domains of distinct vacancy- 
network polymorphs, the boundaries between which are hidden from 
conventional PXRD analysis (Fig. 3b). These boundaries emerge from our 
Monte Carlo analysis either from anomalies in the Monte Carlo energy 
gradient AF/AT, or by variation in anisotropy. The I/II, I/V and II/III transi- 
tions are examples of the former; III/IV andIV/V are examples of the latter. 
Despite the differences in diffuse scattering patterns (and pore-network 
characteristics) throughout phase I, the entire phase field can be navi- 
gated without any discontinuity in energy or its derivative, or in anisot- 
ropy. Phase Ilis actually a physical mixture of separate components with 
1% and vacancy fractions: one has the Prussian blue structure, and the 
other is layered with tetragonal symmetry. Given this admixture, we donot 
expect the phase to be relevant to PBA chemistry. Phase Vis also tetrago- 
nal, but is heavily disordered and may well be relevant to PBAs. Phase 
IVrepresents a competing mixture of (isotropic) III and (anisotropic) V 
components, and includes a morphotropic phase boundary”®. Phase VI 


avi 


Vi 


tetrahedral networks are shaded green; all others are collated fora given 
coordination number and shaded in grey. Coordination geometries are given at 
the top right: empty and filled circles denote occupied and vacant M’ sites, 
respectively, and bold lines show connecting channels. Note the general 
preference for 90° pore angles at low/’ and 120° angles at high’. 


contains atetragonally ordered array of zig-zag pores. Our confidencein 
the detail of the phase diagram between phases I and Vl is reduced by the 
difficulty of Monte Carlo equilibration at such low 7’ values. 

In Fig. 3d we showarange of physical quantities calculated from our 
Monte Carlo configurations as a function of/’ and 7’. Some of these—for 
example, the Monte Carlo energy gradient log(AF/A7) or the diffuse 
scattering localization Z =log[Z(/)/(Z/)"]—serve primarily to highlight 
the phase boundaries, but others are particularly relevant to the trans- 
port properties of the PBA phases. For example, the tortuosity Tis a 
measure of the curvature of the internal pore space”. It affects the rate 
of mass transport (that is, the conductance): 


p 
cx 
= (2) 


where p is the number of vacancy neighbour-pairs per formula unit®. 
We find that C varies by a factor of two within the high-temperature 
disordered phase 1 and by yet another factor of two upon progression 
into lower-temperature phases. Even accessible pore volumes vary 
substantially—we calculate differences of greater than 25% for this 
same family of configurations. Moreover, in the anisotropic phases 
II, Vand VI, transport depends on orientation. 

This unexpected variability in micropore characteristics helps 
to explain the irreproducibility and diversity of sorption and storage 
properties observed experimentally. But it also highlights the oppor- 
tunity for property optimisation via synthetic control over vacancy 
correlations—that is, defect engineering“. For example, the value of 
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Cshould be maximized for battery materials, which might be achieved 
by combining low/’ (for example, M = Cu’) and high 7’ (rapid precipi- 
tation). This analysis is consistent with the empirical identification 
of polycrystalline Cu[Fe] (CuHCF) as a high-performance battery 
material’”®. 

Our results identify many future challenges. We have focused on 
single-crystal samples because of the relative insensitivity of PXRD to 
the vacancy polymorphism of this family. So establishing a robust link 
between vacancy correlations and, for example, ion-storage capacity 
in hexacyanoferrates will require innovative approaches to measuring 
and interpreting diffuse scattering from microcrystalline samples. 
Serial femtosecond crystallography* or electron diffraction** may help. 
With access to vacancy-network models, it is possible that prospec- 
tive sensitivity of powder pair distribution function measurements 
to vacancy correlations” may now be exploited. Our analysis has also 
been intentionally simplistic: we have not needed to invoke the role of 
alkali-metal inclusion, nor have we considered M’ chemistry or coopera- 
tive Jahn-Teller effects**. Yet these additional degrees of freedom must 
allow further chemical control over pore network characteristics. An 
obvious opportunity is to extend the phase fields of Fig. 3a as a function 
of alkali cation concentration; we might anticipate simpler behaviour 
as the vacancy fraction reduces. There are many variables that might be 
exploited in tailoring PBA network structures—for example, concentra- 
tion, pH, crystal growth rate and media, temperature, speciation, solubil- 
ity, competing ions and chelation—and establishing rules that link these 
variables to vacancy polymorphs represents an enormous challenge. 
Vacancy-network polymorphism may affect magnetic order, spin-state 
transitions, orbital order and photophysics. Moreover, any mechanistic 
understanding of double-metal cyanide catalysis will require charac- 
terization of particle surface structure, which may be substantially 
more complex than previously anticipated. And, stepping back, we 
might ask whether a similar hidden polymorphism plays a rolein other 
microporous phases, such as metal-organic frameworks” or zeolites”. 
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Methods 


Single-crystal growth 

Single-crystal PBA samples were in all but one case grown using 
slow-diffusion methodologies. The exception was the Mn[Mn] crys- 
tal, for which we used the same sample reported in ref. *. All other 
samples were prepared by counterdiffusion of aqueous solutions of 
a potassium hexacyanometallate(III) and a divalent transition-metal 
nitrate, chloride, sulphate or acetate. Precursor salts were used as 
supplied and without further purification; the identity and quantity 
of these salts are summarized in Extended Data Table 1. Crystals 
of Cd[Co], Mn[Co]’, Mn[Fe] and Zn[Co] were grown in H-cells, and 
those of Mn[Co], Cu[Co], and Co[Co] were grown from silica gel. 
For H-cell diffusion reactions, aqueous solutions of precursor salts 
(0.5 ml) were placed in the opposite arms of a glass H-cell. The cell 
was then filled with water, taking care not to disturb the solution- 
water interface. The H-cell was sealed and left undisturbed. Crystal 
growth was typically completed within 1-3 weeks. For gel-diffusion 
reactions, a transition-metal-impregnated gel was first prepared. 
Aqueous solutions of Na,SiO, (2 M, 2.5 ml), M” salt (0.01-0.15 M, 
2.5 ml), and acetic acid (2 M, 5 ml) were combined ina 15-ml centri- 
fuge tube. The gel was allowed to set overnight. An approximately 
stoichiometric quantity of K,[M’(CN),] was then dissolved in water 
(2.5 ml), and the corresponding solution layered carefully above the 
gel. The tube was sealed and the system left undisturbed. Crystal 
growth was typically completed within 3-7 d, with the first crystals 
appearing within 24 h. In all cases, care was taken not to dehydrate 
our samples. 


Single-crystal diffuse scattering 

Single-crystal diffuse scattering measurements were carried out using 
the 119 beamline at the Diamond Light Source (UK) and the BMO1 
beamline at the European Synchrotron Radiation Facility (France). 
Both beamlines are equipped with Pilatus 2M area pixel-counting 
detectors. The same data-acquisition strategy was used at both beam- 
lines and consisted of a single 360° rotation scan around the omega 
axis. The key experimental parameters are summarized in Extended 
Data Table 2. 

Determination of the crystal orientation and indexing and integra- 
tion of the Bragg peaks were carried out using the package XDS™. 
Reconstruction of three-dimensional diffuse scattering was performed 
using the program Meerkat™. Air scattering was measured on anempty 
instrument, reconstructed in the same way as the signal and then sub- 
tracted from the signal. The diffuse scattering data so obtained were 
then averaged in the m3m Laue group. The reconstruction of the 
Mn[Mn] dataset from ref. *' followed exactly the same procedure, using 
the original data frames. 


Preparation of diffuse scattering inset images 

The single-crystal X-ray diffuse scattering patterns shown in the 
insets of Fig. 2 were prepared from the three-dimensional scattering 
reconstruction using the projection method of ref. °°. First, the hkO 
section of each dataset was extracted, then the Bragg peaks were 
cut away, along with all surrounding thermal diffuse scattering. 
Next, the diffuse scattering was projected onto a single Brillouin 
zone. For this projection, square diffuse scattering patches with 
the diagonal corners (h, k) and (h+2,k+2) withh=2n, k=2mwere 
taken and summed together. In the case of binary disorder, this 
procedure enables the removal of the contribution of the molecular 
form factor from the diffuse scattering. The resulting projection 
contains information only regarding the distribution of defects and 
does not include information about the chemical composition of 
those defects®. This process is what enables direct comparison to 
the simulated diffuse scattering patches calculated from the Monte 
Carlo configurations. 


3D-APDF analysis 

Diffuse scattering was analysed using the 3D-APDF method****. The 
experimental diffuse scattering was reconstructed as stated above, 
the background air scattering subtracted by using an empty instrument 
run, andan optimal scale coefficient selected manually. The resulting 
diffuse scattering was averaged in the m3m Laue group using outlier 
rejection as described by Blessing”. Bragg peaks were removed using 
the ‘punch and fill’ procedure®: spheres of intensity around the Bragg 
peaks were removed to ensure omission of thermal diffuse scattering 
contributions from subsequent analysis. The resulting holes were filled 
with the median intensity from a small surrounding region of recipro- 
cal space. The crystals showed a large amount of thermal diffuse scat- 
tering around the Bragg peaks, and so the radius of spheres for 
punching and filling was chosen to be relatively large—approximately 
0.5 reciprocal lattice units. Finally, the 3D-APDF map was calculated 
using a fast Fourier transform. Quantitative 3D-APDF refinement was 
carried out using the program Yell*”. 

Here we give details for the representative case of the Co[Co] sample. 
The diffuse scattering map is presented in Extended Data Fig. 2a. Note 
that, owing to over-correction of the background, some pixels show 
negative intensities (marked red). 

The 3D-APDF map is presented in Extended Data Fig. 2b. The 3D-APDF 
map gives the difference between the crystal pair distribution func- 
tion and its Patterson function. This map should be interpreted in an 
analogous manner to the Patterson map; in particular, the signal at a 
position uvw corresponds to all the pairs of atoms in the structure which 
are separated by the vector components u=x;- Xj, V=);—)j, W=Z;- Z, 
where i and index atom pairs. The 3D-APDF consists of positive and 
negative signals. Positive signals mean that corresponding interatomic 
pairs are present more often in the real structure than in the average 
structure; negative signals mean the opposite. For a more detailed 
introduction to the 3D-APDF see ref. *. 

Interpretation of the 3D-APDF map in the current case is sim- 
plified by the presence of the strongly scattering atom cobalt in 
the partially vacant [Co(CN),]* moiety. The contribution of each 
interatomic pair is proportional to the product of the number of 
electrons in both atoms, and so the 3D-APDF will be dominated by 
signals from pairs containing cobalt atoms. All atoms that are more 
likely to appear together with cobalt will give a positive contribu- 
tion, and all of the atoms that tend to appear less often will be seen 
as negative signals. 

The centre of the 3D-APDF spaceis positive and represents all intera- 
tomic vectors within the [Co(CN),]* group. Inthe w0 section, the signal 
appears asacross. This is because the heavy cobalt atom is inthe centre, 
and the four equatorial CN’ groups are around it. Similar crosses are 
located at positions corresponding to the face-centred lattice vectors. 
They correspond to the correlation between simultaneously present 
[Co(CN),]* groups. By contrast, the nearest neighbour at 0.5, 0.5, O 
shows a negative correlation, meaning that the probability of finding 
two [Co(CN),]* groups separated by this vector is less than 4/9 (the 
fraction observed ina completely random distribution of vacancies). 

The program Yell enables refinement of the 3D-APDF in terms of 
pair correlations. In order to speed up the refinement, the voxel size 
of diffuse scattering was increased in reciprocal space by binning sets 
of 5x 5x5 voxels together. The final voxel resolution was 1/6 recipro- 
cal lattice units, corresponding to the pair distribution function map 
containing correlations spanning the nearest three unit cells in the x, 
yand zdirections. 

In our modelling of the 3D-APDF, we have assumed that two-thirds 
of the M’ sites are occupied by [Co(CN),]* ions and the remaining one- 
third are occupied by anH,O-filled vacancy (Extended Data Fig. 3). The 
latter was modelled with six structural water molecules and four zeolitic 
water molecules. The model 3D-APDF is shown in the right-hand panel 
of Extended Data Fig. 2b. It is clear that the model accounts well for 
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the majority of the observed features. The resulting correlation coef- 
ficients for the first 44 neighbours are listed in Extended Data Table 3. 


Monte Carlo simulations and analysis 

Monte Carlo simulations were carried out using a parallel tempering 
approach®* implemented within custom-written code. We note for 
context the use elsewhere of Monte Carlo methods for interpreting 
single-crystal diffuse scattering**’. In our case, for each/’ value, an 
ensemble of 129 configurations was generated and Monte Carlo simu- 
lations were carried out at a suitable distribution of temperatures 7” 
(0.75 < T’ < 4.97329, with T’; = 1.014889 x T’,_,; thatis, evenly spread in 
log7’). Each configuration represented a 12 x 12 x 12 supercell of the 
fcc unit cell, containing a total of 6,912 sites. Configurations were ini- 
tialized with a random distribution of vacancies, such that exactly 
one-third of the sites were vacant. The Monte Carlo steps involved swap 
moves: two sites, one occupied and one vacant, were selected at random 
and their contents swapped. In addition to these Monte Carlo steps, 
the algorithm involved replica exchange. An attempt for one replica 
exchange was performed every four Monte Carlo steps. For this, two 
reservoirs with nearest temperatures were selected at random, and 
the temperature swap was performed with probability 
p=expl[(E,- £5)/( T- Tl. The configurations were equilibrated for 
100 epochs (one epoch corresponds to the number of steps required 
to visit each site twice on average), following production steps of 
80 epochs each. The thermodynamic quantities (£, F?) were sampled 
at each Monte Carlo step, and all other quantities (tortuosity, diffuse 
scattering, etc.) were calculated from 40 configurations separated 
from one another by two epochs. Convergence was determined by the 
convergence of the Monte Carlo energies of the models. It was found 
that almost all configurations converged, except for a few configura- 
tions of polymorphs II, IV and VI at the very lowest sampled tempera- 
tures. However, the diffuse scattering from unconverged configurations 
of phase IV showed sharp streaks parallel to the reciprocal axes a*, b* 
and c* that were very similar to the streaks observed experimentally 
in Mn[Co]. In our view, the experimental crystals are also not neces- 
sarily all at thermodynamic equilibrium, and so we decided to keep 
the simulation without change. The diffuse scattering patterns shown 
in Fig. 3 were calculated using a fast Fourier transform, averaged inthe 
m3m Laue group. Only the AKO planes were extracted and shown. 

Surface area and accessible pore volume calculations were calculated 
using the Zeo++ code” for small configurations, anda related custom- 
written code for larger configurations. 

Tortuosity was calculated as the average of the distance froma 
vacancy site to a plane along the percolation channel divided by the 
‘flight’ distance to the plane: r= d,/d;. Here, d;is the length along the 
percolating links from the vacancy to a plane and d/ is the ‘flight’ dis- 
tance of the same vacancy to the plane. The values of tortuosity were 
calculated in each of the (100) directions and then averaged. Because 
vacancies are connected with each other in the (110) directions, the 
smallest tortuosity achievable in this structure is v2. 

Strain effects mean the energy of charged defects may not always 
scale as the square of the charge (see, for instance, ref. *), and so we have 
checked the robustness of our results with the modified Hamiltonian 
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where, compared to equation (1), we have used the absolute value in the 
J, term. The resulting phase diagram is essentially the same, albeit the 
simulations converge considerably more slowly, owing to additional 
frustration in phase V. The diffuse scattering from this model is shown 
in Extended Data Fig. 4. 

Surface area and accessible pore volume calculations were per- 
formed using the Zeo++ code“. We found the code to be prohibitively 


slow for the large 12 x 12 x 12 configurations, and so we used smaller 
configurations (6 x 6 x 6) for initial calculations. In these calculations 
we observed that both the accessible volume and the accessible surface 
areas were directly proportional to the number of accessible vacan- 
cies in the structure. Consequently, for the larger 12 x 12 x 12 configu- 
rations, we obtained the final values using the following relations: 
S,=1,551 x N,/N, [m’ g“] and V, = 0.074 x N,/N, [ml g“]. Here, V, is the 
total accessible volume, S, is the accessible surface, N, is the number 
of accessible vacancies and N, is the total number of vacancies in the 
simulation box. 


Location of projections within the phase diagram 

For all but one PBA system, the corresponding experimental diffuse 
scattering ‘tiles’ shown in Fig. 3 were positioned to minimize the dif- 
ference between experimental and model intensities. We determined 
these differences using the diffuse scattering R factor: 


ppm Dakt Ue(AKl) ~ [shy (Akl) + c}}? 
Y nus Ue(AKD YP 


(3) 


Here, sis the scale coefficient of the model, cis a constant background, 
and /, and /,, are the experimental and model scattering intensities, 
respectively. The parameters s and c were determined by linear mini- 
mization of the R factor. 

The exception to this approach was in the case of Mn[Co]. Here, our 
experimental data showed the presence of sharp streaks of diffuse 
scattering parallel to the a*, b* and c* directions. The R-factor approach 
described above could not correctly place this tile, owing to the absence 
of accurate modelling of the experimental resolution function. Instead, 
this particular tile was placed within area IV of the phase diagram, which 
we felt best accounted for the qualitative features of the experimental 
data. The projected experimental diffuse scattering patterns and the 
closest matching tiles from the model phase diagram are compared 
in Extended Data Fig. 5. 

As a final point we note that comparison metrics alternative to the 
one we propose here are easily envisaged. We tested anumber of these 
metrics during the course of our own analysis and found that different 
approaches gave slightly different positions for the various diffuse scat- 
tering tiles. Nevertheless, in essentially all cases the general features 
noted in the main text were preserved; for example, that Zn[Co] and 
Cd[Co] were placed on the right-hand side of the phase diagram, Cu[Co] 
on the left, and the M= Mn, Co samples were near/’ = 1. 


Additional diffraction features in PBAs 

In addition to the vacancy-driven diffuse scattering and satellites 
described previously, we observed diffraction features that are not 
related to vacancy order. These intensities are not covered by our model 
(understandably), but we expand here on why their presence does not 
affect our analysis or the conclusions drawn. 

For the Cu[Co] sample, we observed not only vacancy-driven dif- 
fuse scattering and Bragg peaks corresponding to Pcentring, but also 
additional Bragg peaks near the half-integer positions. The indices 
of such reflections are approximately equal to (h, k, 0.9/+ 0.5); that 
is, they come from a tetragonally distorted crystallite with unit-cell 
parameters @’ = main, ANd C’ = 1.1A main, WHELE Amain iS the unit-cell param- 
eter of the primary crystal phase (Extended Data Fig. 6). Other Bragg 
peaks from this sample showed no signs of such tetragonal distortion 
and the total intensity of additional reflections was less than 1% of the 
total main reflections, and so we can conclude that these additional 
reflections come from a tetragonally distorted impurity precipitate 
which is coherent with respect to the primary cubic matrix. 

Some crystals also showed diffuse scattering centred at positions of 
the type (h+ 1/3, k+ 1/3, J (Extended Data Fig. 7). In particular, these 
additional signals were visible for the Cd[Co], Mn[Mn], Mn[Fe] and 
Mn[CoJ crystals, but not for the Mn[Co], Zn[Co], Cu[Co] or Co[Co] 


crystals. The intensity of this diffuse scattering was zero for /=0 (inthe 
planes passing through the centre of reciprocal space), and increased 
with the magnitude of /. This is a typical signature of diffuse scatter- 
ing from displacive correlations, with displacement polarized along 
the c direction. Interestingly, in the Mn[Co]’ and Cd[Co] samples the 
diffuse scattering intensities follow tetragonal symmetry, instead of 
the cubic symmetry of the Bragg peaks. Note that because displacive 
diffuse scattering is absent in the hkO, hO/ and Ok/ sections it has no 
contribution to the projected diffuse scattering tiles from Figs. 2 and 
3 and thus does not influence the analysis of vacancy distributions. 


Data availability 
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Extended Data Fig. 1| Representative pore networks. Representative pore networks for each phase within the Monte Carlo simulated phase diagram. 
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Extended Data Fig. 2| Representative 3D-APDF. a, Experimental diffuse 
scattering from the Co[Co] sample, hkO section; voxel size is 1/30 reciprocal 
lattice units. b, Experimental and model 3D-APDF map of the Co[Co] sample, 
uvO section. 
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Extended Data Fig. 3 | M’-site models. The structure of the [Co(CN),]> and ‘vacancy’ moieties used in our Co[Co] model. 
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Extended Data Fig. 4 | Alternative diffuse scattering phase map. Diffuse scattering calculated with the modified model Hamiltonian; equation (2). 
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Extended Data Fig. 5 | Diffuse scattering tiles. Comparison of projected experimental diffuse scattering with the model diffuse scattering tiles for our various 
PBA samples; compare with Fig. 3. 
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Extended Data Fig. 6 | Satellite reflections. Satellite reflections in our Cu[Co] sample. The inset shows one specific satellite at (7.43, 1, 0). 
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Extended Data Fig. 7 | Inelastic scattering. Diffuse scattering in the Mn[Co]’ sample. Note that the intensity of the diffuse scattering at the (h+ 1/3, k+1/3, 1) 


positions increases with increasing /; the scattering in the hk4 layer (right) is stronger than in the Ak2 layer (left). 


Extended Data Table 1| Synthesis summary 
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Extended Data Table 2 | Data collection strategies 


x Ar i i A 
PBA Beamline (a) iA) aa range ss 
Cu[Co] 119 0.6889 0.83 360 0.1 
Co[Co] 119 0.6889 0.83 360 0.1 
Mn[Co] 119 0.6889 0.83 360 0.1 
Mn[Co]’ 119 0.6889 0.83 360 0.1 
Mn[Fe] 119 0.6889 0.83 360 0.1 
Ca[Co] _BMo1 0.6975 0.64 360 0.1 
Zn[Co] 119 0.6889 0.83 360 0.1 
Mn[Mn] BMOo1 (Mar345) 0.71 «1.3180 0.5 


Single-crystal X-ray diffuse scattering data collection strategies. Included in the last row are, 
for comparison, the relevant values for the Mn[Mn] sample reported in ref. *". 
Ar, diffraction resolution; Ad, angular step size. 


Extended Data Table 3 | 3D-APDF model 
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Outdoor air pollution adversely affects human health and is estimated to be 
responsible for five to ten per cent of the total annual premature mortality in the 
contiguous United States! *. Combustion emissions froma variety of sources, such as 
power generation or road traffic, make a large contribution to harmful air pollutants 
such as ozone and fine particulate matter (PM, ;)*. Efforts to mitigate air pollution 
have focused mainly on the relationship between local emission sources and local air 
quality’. Air quality can also be affected by distant emission sources, however, 
including emissions from neighbouring federal states®**. This cross-state exchange of 
pollution poses additional regulatory challenges. Here we quantify the exchange of 
air pollution among the contiguous United States, and assess its impact on premature 
mortality that is linked to increased human exposure to PM, ;and ozone from seven 
emission sectors for 2005 to 2018. On average, we find that 41 to 53 per cent of air- 
quality-related premature mortality resulting from a state’s emissions occurs outside 
that state. We also find variations in the cross-state contributions of different emission 
sectors and chemical species to premature mortality, and changes in these variations 
over time. Emissions from electric power generation have the greatest cross-state 
impacts as a fraction of their total impacts, whereas commercial/residential emissions 
have the smallest. However, reductions in emissions from electric power generation 
since 2005 have meant that, by 2018, cross-state premature mortality associated with 
the commercial/residential sector was twice that associated with power generation. In 


terms of the chemical species emitted, nitrogen oxides and sulfur dioxide emissions 
caused the most cross-state premature deaths in 2005, but by 2018 primary PM, ; 
emissions led to cross-state premature deaths equal to three times those associated 
with sulfur dioxide emissions. These reported shifts in emission sectors and emission 
species that contribute to premature mortality may help to guide improvements to air 
quality in the contiguous United States. 


Long-term exposure to fine particulate matter (PM, ,) and ozone leads 
to an increased risk of premature death’ ’. Indeed, PM, and ozone 
are the most prominent known causes of early deaths associated with 
outdoor air pollution, resulting in more than 90% of total air-pollution- 
related mortalities®". For this reason, PM, ,and ozone have become the 
predominant pollutants for quantifying air quality”. These pollutants 
form mainly through atmospheric chemical reactions following the 
release of precursor emissions. PM, ;, which consists of particles and 
liquid droplets, forms from gaseous precursor emissions of nitrogen 
oxides (NO,), sulfur oxides (SO,), ammonia (NH,), and others. PM, ; 
can also be emitted directly, as in the case of black carbon. Ozone 
forms from gaseous precursor emissions of NO, and volatile organic 
compounds (VOCs). The adverse health impacts due to exposure to 
PM,,and ozone can therefore be attributed to the precursor emissions 
that lead to their formation. Such attribution is useful, as it is these 


emissions that can be directly controlled, rather than the exposure 
that results from them. 

Combustion emissions constitute the largest source of anthropo- 
genic emissions in the USA, and therefore contribute to the formation 
of PM, ,and ozone’. The health impacts attributable to these emissions 
have been estimated in various studies®*, with estimates varying 
between 90,000 and 360,000 early deaths per year. In the context 
of the Environmental Protection Agency (EPA) Cross-State Air Pollu- 
tion Rule (CSAPR) and individual state regulation, measures to further 
reduce the health impacts of pollution would benefit from a greater 
understanding of which sectors and which states are responsible for 
the health impacts in every other state. 

Prior studies have investigated parts of this problem. One study 
estimated the sources of US PM, ; pollution impacts ona fine scale, 
with other work focusing on the roles of individual emission sectors 
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Fig. 1|Early-death source-receptor matrices for 2011. a, Source-receptor (right) (ordering presented in Extended Data Fig. 1). Boxed percentages 
matrix showing total early deaths per year for 48 x 48 states (right), and its represent the fraction of impacts that occur out of the state that caused the 
breakdown into PM, ;- and ozone-attributable impacts (left). b, Source- corresponding emissions. Obtaining the summarized matrices shown using 
receptor early-death attribution to emission sectors (top) andemissionspecies conventional approaches (‘forward difference’) would require 433-year-long 
(bottom) that lead to the formation of PM, ; and/or ozone. States are grouped simulations. Extended Data Fig. 2 presents corresponding matrices for 2005 
into US Bureau of Economic Analysis regions” and ordered west (left) to east and 2018. 
or species’*” for either pollutant. Numerous other studies have focused The relative percentage of total impacts that occurred outside of 


on examining the roles of different emission sectors’ or species””°, _ the emitting state decreased with time, from 53% in 2005, to 45% in 


without quantifying the aspect of pollution exchange. Variationsin 2011and 41% in 2018, meaning that there has been a declining relative 
time have also been discussed”. In all cases these studies havefocused magnitude of cross-state impacts. This fraction varies substantially 
ononly one or two of the dimensions of the problem (emissionsector, between sectors. Electric power generation is the only sector that is 
emission species, pollutant and exchange), andno previousworkhas regulated by the CSAPR, and has the highest out-of-state impacts as a 
integrated these aspects together into a single study. Assuch,todate _fractionrelative to in-state impacts: on average, approximately 70% of 
there has been no assessment of cross-state pollution exchangethat early deaths from this sector occur outside of the state that caused the 
quantifies the influence, by sector and chemical species, ofeachstate emissions. However, with reductions in emissions from electric power 
onevery other state’s health risk, using detailed chemistry-transport generation, by 2018 there were 70% fewer out-of-state early deaths 
modelling and including both PM, , and ozone. (approximately 13,000 fewer early deaths) by comparison with 2005. 
In this work, we estimate the pollution exchange betweenthe 48 Roadtransportation, industry and commercial/residential emissions 
contiguous US states, and form source-receptor relationships between — resulted in higher cross-state early deaths in 2018 than electric power 
them for combustion emissions from seven sectors: electric power generation (by 28%, 42% and 74% respectively), but are not regulated 
generation; industry; commercial/residential; road transportation; bythe CSAPRat present. Although PM, ; and ozone impacts can vary 
marine; rail; and aviation. The commercial/residential sectorincludes by +125% to -65% depending on the specific choice of concentration- 
residential combustion (for example, of biomass), nonindustrialcom- _ response function (see Methods), this disagreement does not affect 
mercial and institutional processes, and waste treatment, among other the net pollution exchange between states and the impacts attribut- 
sources. This analysis yields estimates for the number ofearlydeaths ableto eachsector. 
due to PM, ; (primary and secondary, excluding secondary organic The results presented in Fig. la, b reflect both PM, .- and ozone-attrib- 
aerosols) and ozone exposure inevery state, with attributionofimpacts _utable early deaths. Although the number of early deaths per additional 
to each sector and each emitted chemical species fromeverystate.We unit of emissionis approximately eight times higher for PM, , than for 
estimate combustion emissions for the sevensectors for 2005(basedon ozone (not accounting for nonlinear interactions; see Methods), ozone 
the 2005 National Emissions Inventory (NEI)), 2011 (based on NEI2011) impacts are typically transported farther. The fraction of PM, ;impacts 
and 2018 (based onthe NEI2011 forecast), and presentthesefindingsin that happen out of the state that caused them was approximately 41% 
Extended Data Table 1. Lists of the specificsourcesthat aregroupedin for 2011, compared with approximately 75% for ozone for the same year. 
each sector are included in the associated data repository (see Meth- The full source-receptor matrices for each sector-year and species— 
ods). The impacts of these emissions oneachstate’s air qualityarethen year combination are included in the data repository (see Methods). 
quantified using receptor-oriented atmospheric sensitivities fromthe The fact that the source-receptor matrices, presented in Fig. 1, are 
adjoint of the GEOS-Chem chemistry-transport model” (see Methods). not symmetric about the diagonal implies that there is a net imbal- 
We calculate the pollution exchange between every state pairforthe anceinthe exchange of early deaths between the US states. Figure 2 
contiguous US for every combination ofemissionsector,PM,,orozonepre- _ presents this exchange in terms of the air-quality-related early deaths 
cursor emission species, and year. The 2011source-receptorrelationsfor per capita because of emissions from each state (Fig. 2a) and occurring 
thetwo pollutants andthe totalimpactsaresummarizedinFig.1a.Matrices within each state (Fig. 2b), as well as the net exchange between states 
for differentsectors and emission species are presentedinFig.1b.Source- _ (Fig. 2c). A positive value in Fig. 2c indicates that a given state is a net 
receptor matrices for allthree years are presentedin Extended DataFig.2. ‘exporter’ of early deaths—that is, that emissions in that state cause 
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Fig. 2| Total annual early deaths caused per 10,000 people for 2005, 2011 
and 2018. The left plots (a) show the total aggregate early deaths caused by 
emissions in each state, divided by the population of the emitting state. The 
middle plots (b) show the total early deaths caused in each state, divided by the 


more early deaths outside of the state, than are caused within that state 
by emissions from elsewhere. A negative value indicates the opposite: 
that the state is anet importer of early deaths. 

Three broad patterns are visible. First, the largest exporters are in 
the northern midwest, owing to low local populations, high emissions, 
and large downwind populations. Wyoming was the highest exporter 
ona per capita basis in 2005, with North Dakota and West Virginia fol- 
lowing. While these states remained some of the largest per-capita 
exporters in 2018, their exported impacts fell by roughly 50% over this 
period (see the examples in Extended Data Table 2). Second, a cluster 
of states in the northeast are consistent net importers of impacts. New 
York was the highest net importer of early deaths in all three years, on 
both a per-capita and an absolute basis. For 2011, the approximately 
2,800 deaths incurred in New York because of New York emissions 
represent 60% of the total deaths caused by New York emissions, and 
approximately 40% of the total air-quality-attributable deaths in the 
state. This implies that around 60% of deaths in New York are imported 
from other states. Finally, states on the west coast have a net exchange of 
around 0, owing to acombination of no upwind emissions (attributable 
to any state), relatively sparse population downwind, and large local 
populations. We present examples of state-level sectoral contributions 
in Extended Data Figs. 3-5. 

Figures 3a, b present the US-wide early-death impacts for each sec- 
tor and each chemical species, respectively. Impacts from all sectors 
decrease over the studied period, with the exception of commercial/ 
residential and aviation (landing and take-off only). Impacts due to 
commercial/residential emissions increase by 31% between 2005 and 
2011, but remain steady (within approximately 5%) from 2011 to 2018. 
Aviation landing and take-off impacts increase by approximately 60% 
between 2005 and 2018, but contribute around 0.3% to the summed 
2018 impacts. Impacts from electric power generation reduce from 
22% of total summed impacts in 2005 to 11% in 2018. We estimate that 
reductions in emissions from electric power generation have led to 
around 15,900 avoided early deaths in 2018 and, interpolating lin- 
early, to approximately 137,000 avoided early deaths integrated over 
the 14 years analysed here. Because of these changes, electric power 


population of the state. The right plots (c) show the total early deaths exported 
by each state, divided by the population of the state (that is, the difference 
between plots aandb). These impacts are based on summed contributions 
from each emitted species (see Fig. 3). 


generation changes from being the second most important emission 
sector to the fourth, while commercial/residential emissions go from 
fourth to first, responsible for 37% of the summed early deaths attribut- 
able to combustion emissions in 2018. 

Interms of speciated impacts—that is, emissions species that contrib- 
ute to the formation of, and exposure to, PM, ,and/or ozone—primary 
PM, ; emissions had the greatest impact in all three model years. They 
also stayed relatively consistent, with a 13% reduction in health impacts 
from 2005 to 2018. SO,—which was the third-greatest contributor to 
impacts in 2005, making up 19% of the summed impacts—was con- 
tributing less than 6% by 2018. This was due to an approximately 80% 
reduction in SO, emissions. 

Ammonia-attributable impacts increased by around 21% between 
2005 and 2018. This difference was driven by an increase in the sensi- 
tivity of PM, , exposure with respect to a unit of ammonia emissions 
between 2005 and 2011. Owing to the decline in the importance of SO,, 
ammonia impacts went from being the fourth-greatest to the third- 
greatest contributor to total impacts over this period, increasingly 
close to the contribution of NO, species. NO, remained the second- 
greatest contributor to impacts from 2005 to 2018. Despite the roughly 
50% reduction in total NO, emissions between 2005 and 2018, impacts 
attributable to NO, reduced by only around 35% between the two years. 
This is largely due to the increased sensitivity of PM, ; formation to NO, 
emissions between 2005 and 2011, as noted previously”. 

On the basis of a linear combination of impacts by sector, we 
estimate US combustion emissions in 2005, 2011 and 2018 to have 
resulted in 111,200 (95% confidence interval 78,100-144,800), 93,700 
(65,600-121,800) and 76,500 (53,300-99,600) early deaths, respec- 
tively. However, the total impact of all US anthropogenic emissions 
is different to the combined effect of each individual sector or spe- 
cies, owing to nonlinear interactions between the emitted chemicals 
(Fig. 3c). These interactions reduce the total impacts attributable to 
PM, ; by 30-34%. Impacts attributable to ozone instead increase by 
a factor of 2.4 to 2.8 (with the nonlinearity underlying this shown 
in Extended Data Fig. 6), raising the fraction of total early deaths 
attributable to ozone exposure from roughly 10% to around 30%. 
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Fig. 3 | Total annual early deaths attributable to emission sector, emission 
species and in total. a, Total annual early deaths attributable to each emission 
sector. b, Total annual early deaths attributable to each emission species that 
leads to the formation of PM, ;and/or ozone. c, Total annual early deaths. Data 
are shown for 2005, 2011 and 2018, and for PM, ;and ozone. Inc, three totals are 


Taking these nonlinearity effects into account results in total US 
combustion emissions impacts of 96,600 (95% confidence interval 
74,200-125,000), 83,300 (62,400-104,200) and 66,100 (49,300- 
82,900) early deaths for 2005, 2011 and 2018, respectively. This effect 
highlights the difference in expected changes in population exposure 
that result from marginal changes by comparison with larger-scale 
emissions increases or reductions. An explanation of this effect and 
its quantification is given in the Methods. The atmospheric nonlinear- 
ity is also reflected in our computed sensitivity differences between 
2005 and 2011. Thus, a1% reduction in 2011 emissions would lead to 
roughly 940 avoided early deaths. Had the atmospheric response 
toaunit of emissions remained constant between 2005 and 2011 (in 
terms of meteorology and background concentrations), the same 
emissions reduction would have led to around 780 avoided early 
deaths. The changing atmospheric composition thus increases the 
early deaths attributable to a unit of emission. These three effects 
are displayed in Fig. 3c. 

Overall, we have found that more than 40% of the combustion-emis- 
sions-related early deaths cross state lines. This highlights the need for 
a cooperative approach between states for reaching air-quality targets 
or targeting problematic areas, as underlined by the introduction of 
EPA’s CSAPR*. We find that the electric power generation sector is of 
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presented: the sum ofall sectors/species (‘Summed’), which does not account 
for nonlinear interactions between species; the sum of all sectors/species with 
varying emissions, but constant (2005) atmosphere (‘Constant atmospheric 
response’); and the total impacts after accounting for nonlinear interactions 
between species. Tabulated results are presented in Extended Data Table 3. 


declining importance to air quality, by comparison with the increas- 
ing importance of commercial/residential emissions. A 10% decrease 
in emissions from the commercial/residential sector would have 3.3 
times greater benefit than a further 10% decrease in emissions from 
electric power generation. This is reflected in the declining relative 
importance of SO,, and the increasing relative importance of primary 
PM,;,and ammonia. A 10% decrease in primary PM, ;, NO,and ammonia 
emissions would now have 7, 4.5 and 4 times the benefit, respectively, 
compared with a further 10% decrease in SO, emissions. These chang- 
ing relative sectoral and speciated influences provide room to advance 
air-quality mitigation efforts inthe US. 
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Methods 


We present here the data and models used in calculating the cross-state 
early-deaths caused by combustion emissions. We first estimate the spe- 
ciated emissions for each combustion sector. We then use the adjoint 
of a chemistry-transport model to estimate the impact of changes 
in emissions on population exposure. Finally, we relate increases in 
population exposure to public health impacts (early deaths) using 
epidemiologically derived concentration-response functions. These 
steps, intercomparison of our results against existing literature, and 
the limitations of our approach are outlined below. 


Combustion emissions 

Emissions are attributed to each of the six nonaviation sectors in the 
US using SMOKE and the EPA National Emissions Inventory (NEI) for the 
year of 2005 (previously used in ref. °), 2011 (NEI2011v6 version 1) and 
the 2011-based forecast for 2018 (refs. 7’). These are generated ona 
12kmx12km or36 km x 36 km grid, and regridded to the 0.5° x 0.666° 
(latitude x longitude) grid of the nested GEOS-Chem adjoint model. 
The full list of individual sources (and corresponding EPA source 
classification code (SCC) identifiers) that comprise each sector are 
provided in the data repository noted in the Methods section ‘Data 
and code availability’. For road transportation for 2011 and 2018, we 
use the EPA MOtor Vehicle Emissions Simulator (MOVES)-processed 
emissions”®. For aviation emissions we use the Aviation Environmen- 
tal Design Tool (AEDT) inventories for 2006, 2010, 2012 and 2015 
(ref.”°). When referring to each of 2005, 2011 and 2018 aviationimpacts, 
we imply impacts from 2006, the average of 2010 and 2012, and 2015 
emissions respectively, owing to the absence of more recent datasets. 
Only aviation emissions that occur within 1km of the surface (landing 
and take-off emissions) are taken into account. These have been shown 
to capture roughly one-third of total aviation-emissions-attributable 
early deaths in the US**). We account for the underrepresentation of 
EPA’s point-source oil and gas sector (pt_oilgas) in NEI2011v6 version 1, 
by distributing the underrepresented NO, (the difference in pt_oilgas 
NO,. between version 3 and version 1) to the industry sector NO, emis- 
sions ona state level, assuming the existing spatial distribution”’. When 
calculating state source-receptor matrices for the marine sector, we 
only consider marine emissions within state boundaries and within, on 
average, around 25 km off the coast over the sea (where applicable). 
Besides the marine sector, which does not necessarily fall within state 
boundaries, we donot account for the impacts of emissions that occur 
outside of this domain and might contribute to US early deaths. Further 
details on emissions modelling are provided in the data repository. 


Air-quality modelling 

We use the adjoint of the GEOS-Chem chemistry-transport model” to 
calculate the sensitivities of the aggregate population exposure in each 
of the 48 contiguous US states with respect to the various emission spe- 
cies inthe North American domain. The resolution of the horizontal grid 
is 0.5° x 0.666° (roughly 55 km x 55 km) (latitude x longitude), with 47 
vertical layers up to 80 km. This horizontal resolution is adequate for 
capturing state-wide impacts” **. Boundary conditions for the nested 
domainare obtained from the global GEOS-Chem model run at 4° x 5° 
resolution, driven by corresponding global meteorological data. Each 
of the 48 sensitivities quantifies the effect that any emission species 
in any location in the contiguous US and at any time will have on the 
population exposure to PM, ; or ozone in each corresponding state. 
We define PM, ;as the mass sum of nitrates, sulfates, ammonium, black 
carbon and organic carbon, capturing both primary and secondary 
PM, ;concentrations. Secondary organic aerosols are not captured. We 
perform an annual simulation for each of PM, and ozone state-level 
exposure, ineach contiguous US state, for 2006 and 2011, resulting in 
192 annual adjoint simulations in total (48 x 2 x 2). We use GEOS assimi- 
lated meteorological data from the Global Modelling and Assimilation 


Office (GMAO) at the NASA Goddard Space Flight Center. The year 
2006 was climatologically warm in the US, with the annual average 
temperature being 0.55 °C higher than the 1995-2015 mean, whereas 
2011 was climatologically average with an average temperature 0.04 °C 
lower than the 1995-2015 mean”. For 2018 we use the 2011 atmospheric 
response. Given the change between 2005 and 2011 (comparing the 
‘Summed’ and ‘Constant atmospheric response’ in Fig. 3), we expect 
that this approximation will result ina maximum error of around 15% 
(as there were larger emissions changes between 2005 and 2011 than 
between 2011 and 2018). Total impacts across all sectors are calculated 
using additional ‘forward’ runs, described at the end of this section. 

The GEOS-Chem baseline emissions are from EPA’s NEI for 2005 and 
2011 accordingly”*””. Previous studies have found that the NEI 2011 road 
transportation NO, emissions are overestimated by around 50% inthe 
southeast and nationally**”. The effects of this are not included here 
as they are, as of the time of writing, not incorporated in EPA’s NEI. An 
overestimation of 50% in the road transportation NO, emissions in 2011 
implies that results presented here overestimate road transportation 
early deaths by around 7,500 (95% confidence interval 5,200-9,700) 
early deaths per year. Other emissions sources, both natural and anthro- 
pogenic, are simulated using the standard GEOS-Chem nested North 
American domain datasets. The Electronic Data Gathering, Analysis and 
Retrieval (EDGAR) global anthropogenic emissions inventory drives 
the global model (from which the boundary conditions for the nested 
simulations are generated)**. This is replaced by regional emissions 
inventories where available (for example, NEI). Biogenic emissions 
are from the Model of Emissions of Gases and Aerosols from Nature 
(MEGAN) inventory”, and lighting NO, emissions are calculated on 
the basis of ref. *°. 

We estimate the impacts of each sector by performing an inner 
(Hadamard) product of the sensitivities with the gridded emissions 
for each of the seven sectors, and calculate the corresponding popula- 
tion exposure impacts. This linear approach was used and validated in 
refs, °?°41-8 against the forward model difference method. 

When calculating the total impacts from all sectors combined, we 
use a different approach to take into account nonlinear interactions 
between the sectors. Total impacts are calculated by comparing the 
surface concentrations in forward GEOS-Chem simulations with and 
without all US anthropogenic emissions. These forward model simula- 
tions allow us to quantify nonlinearity in the response of US air quality. 
Sets of seven forward simulations are conducted for both 2005 and 2011 
to quantify this nonlinearity. Extended Data Fig. 6 shows how the simu- 
lated, population-weighted concentrations of ozone and PM, ,respond 
to large changes in emissions (‘Average sensitivity’). Compared with the 
sensitivities used for single-sector and speciated impact calculations 
(‘Marginal sensitivity’), the full, nonlinear PM, ; response to removal of 
allemissions is found to be 30-34% smaller, while the ozone responseis 
found to be 2.4-2.8 times greater, implying greater nonlinearity effects 
for ozone by comparison with PM, ;. This is because ozone sensitivities 
are larger when ozone concentrations are low, owing to the greater 
ozone-production efficiencies in a clean background atmosphere“. 
For PM, ;, the response nonlinearity is driven by competition between 
SO, (from emitted SO,) and NO, (from emitted NO,) for ammonia”**». 

Total impacts for 2018 are estimated by scaling the 2011 response. 
The scaling factor is calculated as the total growth in US population, 
multiplied by the ratio of the linearized response to 2018 and 2011 
emissions. 


Health impacts 

We quantify air-quality impacts in terms of early deaths (premature 
mortalities). The toxicity of different PM, ; species is assumed to be 
equal, consistent with EPA practice. As with any study of air pollu- 
tion impacts, our results are sensitive to the specific choice of con- 
centration-response function (CRF). To calculate the effects of PM, ; 
exposure, we apply the American Cancer Society (ACS) cohort study 


log-linear response estimate of 6% (range 4-8%) increased risk of all- 
cause mortality per 10 1g m increase in annually averaged PM, ; expo- 
sure, derived for 1999-2000 exposures using the random-effects Cox 
model, and adjusted for 44 individual-level and 7 ecological covariates’. 
This estimate is linearized and applied here for adults over the age of 
30 years old. This CRF has been applied in a number of estimates of 
US pollution impacts****; it is consistent with the results of a global 
meta-analysis of epidemiological literature, which also found a 6% 
(range 4-8%) increase in risk per 10 pg m° (ref. ’). 

Using a different risk estimate would result in a change in the total 
estimated impact. Anexpert elicitation performed by the EPA indicated 
a1% (range 0.4-1.8%) increase in all-cause mortalities per 1 pgm? of 
exposure’. This would imply aroughly 70% increase in calculated early 
deaths, although all relative comparisons would remain the same. 
Another alternative based on the US medicare cohort would imply 
a roughly 18% increase in the calculated early deaths for PM, ;, when 
applied to the same 30-plus population (again with all relative com- 
parisons staying the same, but with the caveat that this was derived in 
a 65-plus cohort)”. Extended Data Table 4 shows how the estimate of 
total impacts, accounting for nonlinearity of the atmospheric response, 
is affected by the estimated relative risk, including the previously cited 
studies”””, refs. > and the results of a meta-analysis of epidemio- 
logical literature’. Although we cannot directly apply a nonlinear CRF, 
using the mean 2011 US concentration of PM, ;in the global exposure 
mortality model (GEMM)”, we estimate a 35% increase in calculated 
early deaths. 

For ozone, we apply the respiratory disease mortality CRF of 
ref. °*; this is based on US exposure data from the same ACS study 
as above’. Impacts are calculated using the 8-hour maximum daily 
average ozone over the entire year, and applied to the same popu- 
lation. However, as with PM, ;, there is disagreement regarding the 
correct exposure response curve to use. Extended Data Table 4 also 
includes estimates of ozone impacts, accounting for nonlinearity of 
the atmospheric response, using different ozone exposure response 
curves from the literature’, Using the all-cause mortality CRF of 
ref.” would result in a 110% increase in total mortality due to ozone 
exposure. Applying the all-cause mortality CRF of ref. °° to quantify 
ozone health impacts would instead result ina roughly 17% increase 
inthe reported early deaths due to ozone exposure. We note that the 
CRF of ref. °° is based on mean summertime ozone exposure, whereas 
we measure annual-average exposure to 8-hour maximum ozone. 
However, ref. °°? showed that the response of respiratory mortality 
to chronic ozone exposure is similar when using either annual aver- 
age (12% increase per 10 ppbv) or warm season (10% per 10 ppbv) 
exposure. 

Population data are obtained fromthe global rural urban mapping 
project (GRUMP)* and LandScan* databases. For 2018, we scale the 
2011 population to match the 2017 US Census totals**. State popu- 
lation fractions over the age of 30 years old are obtained from the 
US Census Bureau for 2011 (ref. °”). The US baseline all-cause and res- 
piratory disease incidence rates are obtained from the WHO for 2012 
(ref. °°), For both PM, ,and ozone, the early-deaths confidence intervals 
reflect the reported uncertainty range for the CRF. Uncertainty in the 
summed PM, ,and ozone impacts is calculated by performing a Monte 
Carlo simulation with 10° independent draws of each CRF, applying a 
triangular distribution to both. 


Intercomparison with other studies 

Pollution exchange on an intercontinental scale has previously been 
estimated for ozone’, PM, ; (refs. ©), and both®, highlighting the 
influence of emissions from cross-continental sources. Regional stud- 
ies have focused on individual species or species and pollutants—for 
example, the NO, to ozone effect between EU countries” and between 
US states”, sources of black-carbon impacts in parts of the US", and 
fine-scale monetized US PM, ,impacts of different sectors®, in addition 


to other studies not using detailed chemistry-transport model (CTM) 
approaches. 

The main contribution of our work is the breakdown of both air- 
pollution causes and impacts in the US, and there are no studies to 
which direct comparisons at the level of disaggregation in our work 
can be made. However, the aggregate results of this study compare 
well with those in the existing literature. Ref. reports a roughly 25% 
decrease in PM, ;-attributable early deaths in the US between 2005 
and 2014, whichis similar to the roughly 22% found here (interpolating 
for these two years). Our estimated total early deaths fall within the 
uncertainty ranges of recent studies, for example, the 79,300 (95% 
confidence interval 39,700-113,000) non-agriculture-related 2015 
US early deaths reported in ref. °; the 88,400 (66,800-115,000) 2015 
US PM, ,-attributable early deaths reported in ref. ”; and the central 
estimate of 107,000 total 2011 US PM, ,-attributable deaths (of which 
around 85,600 correspond to non-agriculture- and non-fire-related 
deaths) reported in ref. °. As in these studies, our 2011 estimates are 
higher than the 2010 estimates of ref. * (around 37,400 US early deaths 
for non-natural and non-agriculture-related deaths). In addition, 
refs. *° report different sectoral attributions, probably owing to the 
different emissions inventory used (EDGAR versus NEI). Our secto- 
ral and speciated relative attribution is similar (for 2005) to that of 
ref. ° (with the absolute values being different because of the different 
health-impacts function applied). 

We also compare our estimated changes in population exposure 
to data obtained from monitor sites. We find that, between 2005 and 
2011, the simulated population exposure to PM, ; and ozone (taking 
into account nonlinearities) fell by roughly 20% and 8.6% respectively. 
For the same two years, EPA’s annual trends from nationwide monitor 
sites showa decrease of 24% and 8% for PM, ,and ozone concentrations 
respectively”. 


Limitations 

Interms of air-quality modelling, even though the 0.5° x 0.666° (roughly 
55 km x 55 km) (latitude x longitude) resolution is sufficient for cap- 
turing state-level regional impacts, it may underestimate primary 
PM, impacts and misrepresent ozone impacts in densely populated 
urban areas. This is in part due to the instantaneous dilution of the 
emissions, and, for ozone, to the highly nonlinear relationship between 
ozone formation and background VOC and NO, concentrations. The 
EPA NEIs that are used here, and in policy assessments, are also only 
an approximation, with some known issues that we do not explicitly 
account for®**”. This could affect both the baseline calculation of the 
sensitivity and the absolute impacts attribution. In addition, the emis- 
sions presented for 2018 are forecasted from the NEI2011 inventory. 
Such forecasts are inherently uncertain” “. Finally, previous studies 
have shown a tendency for GEOS-Chem simulations to overestimate 
nitrates”. This may result in artificially increased PM, , formation in 
response to combustion emissions. 

In estimating health impacts, the choice of CRF is critical for early- 
death calculations. Here we apply the all-cause CRF for PM, ;fromthe 
ACS cohort study’ because of the large and nationally representative 
cohort it is based on, and because of its wide application in PM, <- 
attributable health-impact estimates in the literature. This CRF was 
derived for pre-2000 concentrations, and we thus assume no hetero- 
geneity in effect estimates over time (as concentrations change). An 
analysis of the level of disagreement between different CRFs, and the 
effect on our estimated impacts, is presented in the ‘Health impacts’ 
section above. 

We assume equal toxicity between different PM, species, consist- 
ent with EPA’s practice. However, epidemiological work on differen- 
tial toxicity has provided estimates for mortality predictors based on 
exposure to individual PM, ,constituents”. Sulfates and black carbon 
have specifically been highlighted because of their suspected higher 
toxicity amongst PM, , constituents”’”®. 
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Here we choose to quantify all-cause and respiratory-disease mortal- 
ity for long-term exposure to PM, , and ozone respectively, but note that 
human exposure to PM, ;and ozone has been correlated with a variety 
of specific health endpoints, such as neurological diseases”, various 
forms of cancer®®, low birth weight®, and others. Short-term exposure 
to PM, ;and ozone has also been found to correlate causally with an 
increased likelihood of early death®’, and is not included here. Nonfa- 
tal (morbidity) effects attributable to PM, ,and ozone exposure—includ- 
ing acute respiratory symptoms, exacerbated asthma, days of work 
and school lost, upper and lower respiratory symptoms, nonfatal heart 
attacks, acute bronchitis, and hospital and emergency-department 
visits—are also not captured. In addition, given the aggregate nature 
of the adjoint objective function, we present results for the aggregate 
state-level population. Air-pollution-related health impacts, however, 
have been known to disproportionally affect different races, ages and 
socioeconomic backgrounds*”. These are not broken down here. 

We also note that this work quantifies the pollution exchange 
between the contiguous US states, and does not take into account 
sources outside of this domain (for example, Mexico, Canada and inter- 
continentally®**). In addition, while changes in emissions are probably 
the largest driver of changes inthe cross-state, sectoral and speciated 
patterns between the years, effects of meteorological changes can 
also contribute, and are not specifically decoupled here. Finally, for 
simultaneous, large changes in multiple pollutant emissions, there may 
be nonlinear interactions. These interactions could change the total 
impact relative to that calculated for individual sectors here, where 
independent changes are assumed. For this reason, and as discussed 
above, we calculate and present total impacts (aggregated across all 
sectors) using forward simulations in which all emissions are reduced 
simultaneously. 


Data availability 


The cross-state source-receptor matrices generated and analysed here, 
together with sector definitions, are available in the 4TU.ResearchData 
repository at https://doi.org/10.4121/uuid:edfc5304-39ed-4556-a95a- 
f8b3313f7cfc. 


Code availability 


The atmospheric modelling code used is publicly available; instructions 
for download are given at http://wiki.seas.harvard.edu/geos-chem/ 
index.php/GEOS-Chem_Adjoint. 
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Extended Data Fig. 1| Source-receptor matrix showing total impacts in 2011 for the contiguous US. ‘By each state’ indicates sources; ‘in each state’ indicates 
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and thus the total results are scaled to match this. The ‘marginal sensitivity’ 
lines indicate the gradient of the response obtained by our GEOS-Chem adjoint 
simulation, and are used for calculations of individual sector and species 
impacts (where individual perturbations are of smaller size). The difference 
between the zero intercept of the two lines constitutes the ‘interaction’ effect. 
All values are population-weighted means for 2011. 


Article 


Extended Data Table 1| Primary PM, ,, NO, and SO, emissions totals for 2005, 2011 and 2018 


2005 
PM2s5 NOx SOx NH3 co 
Electric power generation 0.46 3.42 9.46 0.02 0.57 
Industry 0.57 215 2355 0.13 3.03 
Commercial/ 0.69 0.76 0.49 0.04 4.82 
Residential 
Road transportation 0.27 8.17 0.16 0.14 39.3 
Marine 0.07 1.30 0.45 0.00 0.18 
Rail 0.03 1.01 0.07 0.00 0.11 
Aviation 0.003 0.60 0.06 -- 0.26 
(LTO %) (15%) (13%) (15%) (44%) 
2011 
PMa2s NOx SOx NH3 co 
Electric power generation 0.18 1.78 4.10 0.02 0.69 
Industry 0.41 2.52 1.20 0.09 2.49 
Commercial/ 0.62 0.61 0.22 0.12 3.67 
Residential 
Road transportation 0.19 5.10 0.03 0.11 23.1 
Marine 0.02 0.60 0.05 0.00 0.11 
Rail 0.02 0.77 0.01 0.00 0.12 
Aviation 0.003 0.58 0.05 -- 0.17 
(LTO %) (13%) (13%) (13%) (40%) 
2018 
PM2.s NOx SOx NHs co 
Electric power generation 0.19 1.42 1.32 0.04 0.74 
Industry 0.52 2.15 0.85 0.09 2.42 
Commercial/ 0.64 0.60 0.12 0.12 3.82 
Residential 
Road transportation 0.11 2.38 0.01 0.08 14.0 
Marine 0.01 0.37 0.002 0.00 0.09 
Rail 0.02 0.67 0.001 0.00 0.14 
Aviation 0.003 0.63 0.05 -- 0.15 
(LTO %) (12%) (12%) (12%) (33%) 


Emissions expressed in teragrams per year for each sector for 2005, 2011 and 2018. Emissions for all sectors apart from aviation are derived from EPA's NEI. Aviation emissions are taken from 
the AEDT inventory”? (for years 2006, 2010/2012 and 2015) and include emissions that occurred over the contiguous US. The percentages of aviation emissions that occur within around 1 km of 
altitude (landing and take-off emissions) are given in parentheses, and are the aviations emissions included in our analysis. 


Extended Data Table 2 | Five states with the greatest reduction in annual early deaths between 2005 and 2018 


By each state In each state 
State 2005 2018 a a State 2005 2018 aa 
WV 1,740 TAS 1,000 AL 2,080 990 1,080 
[1,240-2,250] [540-950] (57%) [1,520-2,640] [730-1,260] (53%) 
AL 2,640 1,280 1,350 ME 330 170 160 
[1,920-3,350]  [950-1,610] (51%) [240-430] [120-220] (48%) 
TN 2,370 1,210 1,160 MS 970 510 460 
[1,730-3,020]  [880-1,550] (49%) [710-1,220] [380-640] (47%) 
MD 3,240 1,660 1,580 VA 3,760 2,010 1,750 
[2,250-4,230] [1,140-2,180] (49%) [2,680-4,840] [1,440-2,590] (46%) 
KY 2,750 1,430 1,320 WV 790 420 370 
[1,940-3,560]  [1,020-1,850] (48%) [570-1,010] [310-540] (46%) 


Data are given in terms of early deaths caused by emissions from each state and in each state. Values in square brackets show 95% confidence intervals. 
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Extended Data Table 3 | Early deaths attributable to each sector and species (that lead to PM, , and/or ozone formation) for 


2005, 2011 and 2018 


(a) 


(b) 


Sector 


Electric power generation 
Industry 
Commercial/residential 

Road transportation 

Marine Transportation 

Rail transportation 

Aviation 

Non-linear PM2>s interactions 
Non-linear ozone interactions 


Total 


Species 


NOx 

SO2 

Primary PM2.s 

NH3 

Other 

Non-linear PM2.s interactions 
Non-linear ozone interactions 


Total 


2005 
24,400 
[16,900-31,800] 
22,400 
[15,600-29, 100] 
20,400 
[14,100-26,800] 
37,000 
[26,500-47,600] 
4,600 
[3,200-5,900] 
2,300 
[1,600-2,900] 
130 
[80-200] 


-29,500 


+17,900 


96,600 
[74,200-125,000] 


2005 
30,000 
[21,300-38,700] 
21,000 
[14,100-28,000] 
34,800 
[23,200-46,400] 
14,400 
[9,600-19,200] 
10,900 
[8,000-13,800] 


-29,500 


+17,900 


96,600 
[74,200-125,000] 


2011 
12,800 
[9,100-16,600] 
19,100 
[13,400-24,700] 
26,800 
[18,300-35,400] 
30,800 
[21,800-39,800] 
1,500 
[1,100-2,000] 
2,400 
[1,700-3,100] 
210 
[140-270] 


-28,400 


+18,000 


83,300 
[62,400-104,200] 


2011 
28,600 
[20,500-36,800] 
9,900 
[6,600-13,100] 
31,000 
[20,600-41,300] 
16,700 
[11,800-23,500] 
6,500 
[4,700-8,300] 


-28,400 


+18,000 


83,300 
[62,400-104,200] 


2018 
8,500 
[6,000-10,900] 
18,200 
[12,900-23,600] 
28,200 
[19,200-37,300] 
18,400 
[12,900-23,800] 
810 
[590-1,000] 
2,100 
[1,500-2,800] 
220 
[150-290] 


-23,500 


+13,100 


66,100 
[49,300-82,900] 


2018 
19,600 
[14,000-25,200] 
4,300 
[2,900-5,800] 
30,200 
[20,100-40,200] 
17,400 
[11,600-23,200] 
4,800 
[3,500-6,200] 


-23,500 


+13,100 


66,100 
[49,300-82,900] 


Values in square brackets are 95% confidence intervals. Only the mean effect is reported for the nonlinear interaction terms. 


Extended Data Table 4| Alternative CRF application to 2011 early deaths for all sectors, for PM, , and ozone 


Pollutant Study Mortality end-points Early deaths 
MES lat 7 All-cause ree 600] 
Ref. 2 All-cause Bere 
Ref. 50 All-cause 164,900-68, 6001 
Ref. 9 All-cause (36,806-73,600] 
Ref. 49 All-cause rare 00] 
Ref. 51 All-cause ee 
Ref. 12 NCD+LRI* ee eee 
Ref. 7 Cardiopulmonary p70 48-00) 
Ref. 9 Cardiovascular Hazes 260) 
Ref. 49 Cardiovascular ee o a08 
Ref. 51 Cardiovascular eiooe 5 600i 
Ozone*** Ref. 52 Respiratory 28,100 
(MDA8, annual avg.) [18, 700-37, 400] 
Ref. 52 (ine aed avg.) 129,800°1 16,100) 
— All-cause 32,900 
(24-hr avg. warm season) [29,900-35,900] 
Ref. 53 prayer season) 1,400-16,300) 


Atmospheric nonlinearity is taken into account. The CRFs used to calculate the estimates in the main text are shown in italics. As in the main text, we apply these to the 30-plus population, using 
corresponding data for disease-specific baseline incidence rates from the WHO for 2012. Uncertainty intervals (in square brackets) reflect the 95% confidence intervals for each CRF. 

*The GEMM model health end-point is all nonaccidental deaths, almost all of which are due to noncommunicable diseases (NCDs) and lower respiratory infections (LRIs). We use the all-cause 
mortality incidence rate from the WHO, excluding all injury-related deaths. 

**To estimate the early deaths from the GEMM model, we use the parameters provided in ref.12 for more than >25 years, excluding the Chinese male cohort study, and use the mean population- 
weighted concentration of PM,, in the US to determine the local relative risk per unit increase in exposure. The uncertainty intervals here reflect one standard error in parameter @ of the model. 
***Note that the different CRF studies compared here assume different measures of ozone exposure (annual mean 8-hour maximum in ref. °2; warm-season (April-September) mean in ref. °°; and 
warm-season 1-hour daily maximum in ref. °°). We apply all of these using the annual mean 8-hour maximum ozone exposure. MDA8, maximum daily 8-hour average. 
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Tobacco smoking causes lung cancer’, a process that is driven by more than 60 
carcinogens in cigarette smoke that directly damage and mutate DNA*». The profound 
effects of tobacco on the genome of lung cancer cells are well-documented® ”, but 
equivalent data for normal bronchial cells are lacking. Here we sequenced whole 
genomes of 632 colonies derived from single bronchial epithelial cells across 16 
subjects. Tobacco smoking was the major influence on mutational burden, typically 
adding from 1,000 to 10,000 mutations per cell; massively increasing the variance 
both within and between subjects; and generating several distinct mutational 
signatures of substitutions and of insertions and deletions. A population of cells in 
individuals witha history of smoking had mutational burdens that were equivalent to 
those expected for people who had never smoked: these cells had less damage from 
tobacco-specific mutational processes, were fourfold more frequent in ex-smokers 
than current smokers and had considerably longer telomeres than their more- 
mutated counterparts. Driver mutations increased in frequency with age, affecting 
4-14% of cells in middle-aged subjects who had never smoked. In current smokers, at 
least 25% of cells carried driver mutations and 0-6% of cells had two or even three 
drivers. Thus, tobacco smoking increases mutational burden, cell-to-cell 
heterogeneity and driver mutations, but quitting promotes replenishment of the 
bronchial epithelium from mitotically quiescent cells that have avoided tobacco 


mutagenesis. 


Lung cancer kills more people globally than any other cancer, and 
80-90% of those deaths are attributable to tobacco exposure’. 
Our model for how tobacco causes lung cancer emphasizes direct 
mutagenesis from the numerous (more than 60) carcinogens in ciga- 
rette smoke*>, combined with indirect effects such as inflammation, 
immune suppression and infection. As recognized first in the sequenc- 
ing of the TP53 gene’ and more recently in genome-wide sequencing of 
lung cancers®”’, tobacco exposure leads to bothan increase in somatic 
mutational burden and an altered spectrum of mutations. Clones of 
lung cancer cells from a smoker typically have tens of thousands of 
somatic mutations®”’; of these, a small handful—probably fewer than 
20-drive the biology of the tumour” ”. 

Epidemiological studies have quantified the relationships between 
lung cancer and duration of smoking, intensity of smoking, type of 
smoking and timing of smoking cessation’ >“. Interpreting these obser- 
vations from population cohorts in terms of the molecular basis for 
tobacco carcinogenesis is challenging. Under a model in which lung 
cancer requires n driver mutations, an exposure that, say, increases 
mutation rates by k-fold should increase incidence by around k” across 


arange of growth patterns”. However, ina paradox first noted in1971°, 
the dose-response relationship between the number of cigarettes 
smoked per day and the risk of lung cancer is linear*"*—that is, k'—or, at 
most, weakly quadratic’’. The benefits of smoking cessation likewise do 
not fit into multistage models of cancer ina straightforward manner”. 
By stopping smoking in middle age or earlier, smokers avoid most of the 
risk of tobacco-associated lung cancer. This benefit begins to emerge 
almost immediately and accrues steadily with time. Of two people who 
smoked the same total number of cigarettes across their lifetime, why 
the person with longer duration of cessation should have a lower risk 
of lung cancer is difficult to explain iftobacco induces carcinogenesis 
exclusively by increasing the mutational burden. 


Sequencing single-cell-derived colonies 


We recruited 16 individuals to assess the landscape of somatic muta- 
tions innormal bronchial epithelium: 3 children, 4 individuals who had 
never smoked (‘never-smokers’), 6 ex-smokers and 3 current smokers 
(Supplementary Table 1). For ethical reasons, samples could only be 
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obtained from subjects who underwent a bronchoscopy for clinical 
indications. The never-smokers and current smokers had a bronchos- 
copy to investigate changes that were eventually diagnosed as benign. 
Of the ex-smokers, two had had a previous cancer treated with curative 
intent, and five had a carcinoma in situ or invasive squamous cell car- 
cinoma that was the indication for the bronchoscopy. The children in 
the cohort underwent a bronchoscopy for investigation or follow-up 
of congenital anomalies: all had normal bronchial epithelium. 

Samples of airway epithelium were obtained from biopsies or brush- 
ings of main or secondary bronchi. These were dissociated into single 
cells, and epithelial cells positive for epithelial cellular adhesion mol- 
ecule (EDCAM’) were flow-sorted (one to a well) onto mouse feeder 
cells allowing basal cell attachment and growth (Extended Data Fig. 1a). 
Each cell was independently cultured to obtain single-cell-derived 
colonies that expressed the transcripts expected for basal cells of pseu- 
dostratified bronchial epithelium (Extended Data Fig. 1b). Around 
15-40% of flow-sorted cells typically produced colonies (Extended 
Data Fig. 1c), confirming that the sequenced cells were drawn from 
a prevalent and representative population of epithelial cells. Colo- 
nies underwent whole-genome sequencing to an average coverage of 
16x (Supplementary Table 2, Extended Data Fig. 2a, b). Using a xenograft 
pipeline to flag non-human sequencing reads, somatically acquired 
mutations were identified from reads specific to the human genome. 
Innearly all colonies, the variant allele fraction (VAF) of mutations was 
around 50% on average, which is consistent with contamination-free 
colonies derived froma single bronchial cell (Extended Data Fig. 2c). To 
remove variants that had possibly been acquired in vitro, we excluded 
mutations witha VAF of less than 30% that were present in only asingle 
colony (Extended Data Fig. 2c). Occasional colonies had alow mean VAF 
(Extended Data Fig. 2d), consistent with seeding by two bronchial cells; 
these colonies were excluded from downstream analyses. We estimated 
that a sequencing depth of 8x gave a sensitivity for variants of 70-75%, 
and this increased to more than 95% at a depth of 15x (Extended Data 
Fig. 2e). The majority of colonies had a sequencing depth greater than 
15x, and we set a minimum cut-off of 8x for inclusion. 

The final dataset comprises catalogues of somatic mutations from 
the whole genomes of 632 single bronchial cells. Five patients had a 
squamous cell carcinoma or carcinoma in situ, three of which we also 
sequenced. Normal basal cells from these patients shared no clonal 
relationships with the carcinomas, and we found no systematic dif- 
ferences in mutational burden between normal cells in the vicinity of 
carcinoma in situ lesions and cells in regions that were histologically 
normal (Extended Data Fig. 2f). 


Mutational burden 


The burden of somatic substitutions per cell showed considerable het- 
erogeneity both across the cohort and even within individual patients 
(Fig. 1a). Using linear mixed-effects (LME) models, we assessed factors 
that influenced the mutational burden (Supplementary Code). Single- 
base substitutions increased significantly with age, at an estimated rate 
of 22 per cell per year (95% confidence interval (Cl), 20-25; P=10°; 
Fig. 1b). Previous or current smoking significantly increased the mean 
burden of substitutions (P=0.0002) by an estimated 2,330 per cell (95% Cl, 
1,180-3,480) in ex-smokers and 5,300 per cell (95% CI, 3,660-6,930) 
in current smokers. 

The effects of age and smoking were expected but, more surprisingly, 
smoking also markedly increased the variability in mutational burden 
from cell to cell, even within the same individual. Among closely collo- 
cated cells froma small biopsy of normal airway froma given subject, the 
estimated standard deviation was 2,350 per cell for ex-smokers and 2,100 
per cell for current smokers, compared with 140 per cell for children and 
290 per cell for adult never-smokers (P< 10° by LME for within-subject 
heterogeneity of variance across smoking categories). There was also 
heterogeneity between subjects: the estimated standard deviation in 


mean substitution burden across individuals was 1,200 per cell for ex- 
smokers and 1,260 per cell for current smokers, compared to 90 per cell 
for non-smokers (P=10°° by LME for heterogeneity of variance). 

Although most cells in ex-smokers or current smokers had a consider- 
ably higher substitution burden than cells in never-smokers, a fraction 
of cells in these patients had burdens within the range expected for 
never-smokers of an equivalent age (Fig. 1c). For many of these patients, 
the distribution of mutational burden was distinctly bimodal, with 
one mode in the near-normal range and the other mode exhibiting a 
substantially increased mutational burden (Extended Data Fig. 3a). 
Notably, although cells with a near-normal mutational burden were 
rarely present in current smokers, their relative frequency was on aver- 
age fourfold higher in ex-smokers (95% CI, 2.0-7.9-fold; P=3 x 10° by 
log-linear model), typically accounting for 20-40% of all cells studied. 
Colonies with a near-normal mutational burden expressed the same set 
of airway basal cell genes as did colonies with an increased mutational 
burden, and had the same tightly associated, cobbled architecture in 
culture (Extended Data Fig. 3b, c), confirming that they derived from 
bronchial epithelial cells. 

Among current and ex-smokers, we found that mutational burden 
was not significantly correlated with the duration of cigarette smok- 
ing or the number of cigarettes smoked per day, even if near-normal 
cells were excluded. However, the small numbers of subjects and large 
within-subject heterogeneity limits our statistical power for this analy- 
sis, and definitive analysis will require much larger sample sizes. 

Insertions and deletions (indels) showed similar associations as 
substitutions, increasing steadily with age (0.7 indels per cell per year; 
95% CI, 0.6-0.8; P=10~°) and tobacco smoking (101 extra indels per 
cell in smokers; 51 in ex-smokers; P= 0.001; Extended Data Fig. 4a). 
Generally, the normal bronchial epithelial cells had few copy-number 
changes or structural variants (Extended Data Fig. 4b)—this repre- 
sents a qualitative difference from lung cancers, which tend to have 
large numbers of structural abnormalities®”””. There were occasional 
examples of more-complex structural events in the bronchial epithelial 
cells, including chromoplexy (Extended Data Fig. 4c) and even chro- 
mothripsis in a cell from a child (Extended Data Fig. 4d). The latter is 
particularly interesting, given that recent data suggest that driver-gene 
fusions in lung adenocarcinoma canarise through complex structural 
events that occur early in life”. 


Mutational signatures 


Arange of mutational processes operate in lung cancers, driven both 
by the exogenous carcinogens present in tobacco smoke and by endog- 
enous DNA damage. These processes leave characteristic signatures in 
the genome®. We built phylogenetic trees for each patient, and applied 
a Bayesian de novo mutational-signature discovery algorithm to muta- 
tions assigned to each branch. We also included samples from squa- 
mous cell lung cancers’ and control samples cultured in vitro” in the 
signature analysis to maintain comparability with previous analyses® 
(Fig. 2). Few mutations in our samples (typically 10-30 per cell) were 
attributed to SBS-18, the signature that accounted for all variants inthe 
control samples”, which confirmed that mutations acquired in vitro 
were minimal in our dataset. Similar results emerged using a different 
mutational-signature algorithm”? (Extended Data Fig. 5a-c). 

A large proportion of mutations in all subjects was attributed to the 
endogenous mutational signature SBS-5, which accumulated linearly 
with age (Fig. 2c, d). As reported previously’®, the absolute number 
of mutations attributed to this signature was higher in those witha 
smoking history (ex-smokers 1,140 per cell, 95% CI, 590-1,700; cur- 
rent smokers 2,200 per cell, 95% Cl, 1,590-2,810; P< 10°). Signature 
SBS-1—which comprises C>T mutations at CpG dinucleotides—contrib- 
uted larger proportions of mutations in children than adults, but the 
absolute numbers of SBS-1-attributed mutations continued to increase 
linearly with age through adulthood (Fig. 2c, d). Presumably, then, 
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Fig. 1| Mutational burden in normal bronchial epithelium. a, Burden of 
single-base substitutions (SBSs), small indels and double-base substitutions 
(DBSs) across patients in the cohort. The box-and-whisker plots show each 
subject, with the boxes indicating median and interquartile range and the 
whiskers denoting the range. The overlaid points are the observed mutational 
burden of individual colonies. b, Relationship of burden of substitutions per 


SBS-lis enriched during early lung development and continues stead- 
ily throughout life, but other signatures become proportionally more 
active in adulthood. A novel signature (Sig-A; Fig. 2b) was universally 
present across samples. It has some resemblance to SBS-5, and likewise 
increased linearly with age. 

Signatures SBS-2 and SBS-13, which are caused by mutagenesis medi- 
ated by APOBEC3A or APOBEC3B, showed striking heterogeneity: they 
were mostly absent from bronchial cells, but occasionally contrib- 
uted hundreds of mutations in an individual cell, evenin children. This 
activity appeared to be temporally restricted: individual branches ofa 
phylogenetic tree had high proportions of SBS-2 or SBS-13 despite their 
absence from antecedent and descendent branches (Fig. 3a, Extended 
Data Fig. 6). This implies that the episodic activity of APOBEC-mediated 
mutagenesis observed in cell lines” extends to somatic cells in vivo, as 
the proportion of mutations attributed to APOBEC enzymes ona given 
branch of the phylogenetic tree does not predict past or future rates 
of mutagenesis in that lineage. 
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cell with age. The points represent individual colonies (n= 632) and are 
coloured by smoking status. The black line represents the fitted effect of age 
onthe burden of substitutions, which was estimated from LME models after 
correction for smoking status and within-patient correlation structure. The 
blue shaded area represents the 95% Cl for the fitted line. c, Fraction of cells 
witha near-normal mutational burden in current and ex-smokers. 


Three substitution signatures were largely restricted to current or 
ex-smokers. Signature SBS-4 was expected-—this is the predominant 
signature in lung cancers from smokers”* and is recapitulated by in vitro 
exposure to polycyclic aromatic hydrocarbons”. SBS-16 comprised 
5-15% of mutations in several current or ex-smokers, but was absent 
from never-smokers. This signature, with its distinctive pattern of 
transcription-coupled damage and repair” (Extended Data Fig. 5d), 
correlates with alcohol and tobacco exposure in hepatocellular car- 
cinomas®”’, but has not been linked with tobacco exposure in lung 
cancers previously. 

Anew mutational signature (Sig-B) was extracted, which comprised 
predominantly T>A and T>C mutations and was evident only in patients 
witha history of smoking (Fig. 2b). The signature was mostly present at 
low rates, but in one ex-smoker it contributed up to 15% of mutations per 
cell. We found a strong transcriptional strand bias, whereby the tran- 
scribed strand showed decreased rates of mutation at the adenine inthe 
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Fig. 2 | Mutational signatures in normal bronchial epithelium. a, Stacked bar 
plot showing the proportional contribution of mutational signatures to single- 
base substitutions across the n= 632 colonies from normal bronchial cells, 
extracted using a hierarchical Dirichlet process (HDP). Within each patient, 
colonies are sorted from left to right by increasing mutational burden (bar chart 
in dark grey above coloured signature-attribution stacks). The dashed black 
vertical lines in current and ex-smokers denote the cut-off between cells witha 
near-normal and an increased mutational burden. b, Trinucleotide context 
spectrum ontranscribed and untranscribed strands of two newSBS signatures 
(Sig-A and Sig-B). The six substitution types are shown across the top. Within 
each substitution type, the trinucleotide context is shown as four sets of eight 
bars, grouped by whether anA, C, GorT, respectively, is 5’ to the mutated base, 


T:A pairing. This is consistent with in vitro data that show that purines 
are more reactive than pyrimidines with mutagens in tobacco smoke’. 

As described above, an unexpectedly high fraction of cells in ex-smok- 
ers had anear-normal mutational burden. These cells had considerably 


and within each group of eight by whether A, C, Gor T is 3’ to the mutated base. 
The activity of the mutational signature onthe untranscribed strandis shownina 
pale colour; onthe transcribed strand it is shown ina darker colour.c, Number of 
base substitutions attributed to the three endogenous signatures across the 
cohort (yaxis;n = 632 colonies) shown according to the age of the subject (x axis). 
The blackline represents the fitted effect of age, which was estimated from LME 
models after correction for smoking status and within-patient correlation 
structure. The blue shaded area represents the 95% Cl for the fitted line. The 
quoted Pvalues for the fixed effects of age and smoking are derived from the full 
LME models. d, Estimated effect sizes of age, smoking status, between-patient 
and within-patient standard deviation of seven signatures (points) with 95% Cls 
(horizontal lines). Estimates are derived from LME models (n= 632). 


lower proportions of SBS-4 mutations than cells with an increased 
mutational burden in the same patients. Instead, the distribution of 
signatures in these near-normal cells resembled that seen in never- 
smokers, with prominent endogenous signatures suchas SBS-5, SBS-1 
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Fig. 3 | Driver mutations in normal bronchial epithelial cells. a, Phylogenetic 
trees showing clonal relationships among normal bronchial cells in three 
representative subjects. Branch lengths are proportional to the number of 
mutations (x axis) specific to that clone or subclone. Each branchis coloured by 
the proportion of mutations on that branch that are attributed to the various 
SBS signatures. The driver mutations that were identified in each branch are 
also shown (black, SBS; red, indel). b, Total number of colonies with mutations 
(left) and number of unique mutations (right) in key cancer genes across the 
sample set (n= 632). ** represents genes that are significant (¢< 0.05 by 
dNdScv) whencorrection for multiple-hypothesis testing is applied across all 


and Sig-A. Phylogenetically, cells with anear-normal mutational burden 
showed polyclonal origins (Fig. 3a, Extended Data Fig. 6), suggesting 
that they do not arise from the expansion of a single ancestral cell. 
Signatures of indels and double-base substitutions that were 
observed in normal bronchial epithelium matched those extracted 
from lung cancers” and those generated in vitro by exposure of cells 
to polycyclic aromatic hydrocarbons” (Extended Data Figs. 7, 8). A his- 
tory of tobacco smoking was particularly associated with a signature of 
double-base substitutions at CpC (equivalently GpG) dinucleotides—a 
finding that is in accordance with the high rates of C>A (G>T) single- 
base substitutions in SBS-4. Similarly, tobacco exposure was associated 
with an indel signature of single-base deletions of cytosines (guanines) 
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coding genes; * represents genes that are significant (q< 0.05 by dSNdScv) 
when correction for multiple-hypothesis testing is applied across known driver 
genes in lung cancers and normal squamous tissues (exact g values are 
provided in Supplementary Table 4). c, Fraction of colonies with 0, 1,2 0r3 
driver mutations across the 16 subjects. d, Distribution of driver mutations 
across colonies inthe cohort, coloured by type of mutation. Loss of 
heterozygosity (LOH) that affects driver mutations is also shown. e, Frequency 
of driver mutations that are shared by more than one colony ina patient (dark 
blue) versus those found ina single colony (light blue) across different cancer 
genes. 


in our dataset. Together, these data suggest that the propensity of 
polycyclic aromatic hydrocarbons in tobacco smoke to bind guanine 
nucleotides can result ina range of mutation types even in normal bron- 
chial epithelial cells, including single-base substitutions, dinucleotide 
substitutions and small indels. 


Driver mutations 

To assess whether any mutations are under positive selectionin normal 
bronchial epithelium, we applied an algorithm, dNdScv, which identi- 
fies and quantifies the number of excess non-synonymous mutations 
compared with the number expected from the rate of synonymous 
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(neutral) variants, correcting for local variationin mutation rates”. With 
hypothesis testing applied across all coding genes, three were signifi- 
cant: NOTCH1 (20 unique non-synonymous variants; g=1* 10>); TP53 
(7 unique non-synonymous variants; g =2 x 10“); and AR/D2(7 unique 
non-synonymous variants; g=4 x10“) (Fig. 3b). When hypothesis test- 
ing was restricted to genes that are mutated in lung cancers”3"86 
and normal squamous tissues” ”’, FAT1, PTEN, CHEK2 and ARIDIA were 
also significant, showing the expected patterns of protein-truncating 
mutations (Supplementary Tables 3-5, Extended Data Fig. 9a). This set 
of significant genes closely resembles those under positive selectionin 
squamous cell lung cancers" and other normal squamous tissues”’ °°. 

Driver mutations were more frequent in patients with a history of 
tobacco smoking (Fig. 3c, Extended Data Fig. 9b). No candidate driver 
mutations were identified in cells from children, and 4-14% of cells 
in adult never-smokers had drivers; by contrast, in current smokers, 
at least 25% of cells carried at least one driver. Furthermore, a small 
fraction of cells in smokers had two or even three coding driver point 
mutations (Fig. 3d)—as many as is seen in some lung cancers”. We used 
generalized LME models to quantify these effects (Supplementary 
Code). Driver mutations were significantly more frequent in individuals 
with asmoking history and showed an increase of 2.1-fold in current 
smokers compared to never-smokers (95% CI, 1.0-4.4; P= 0.04). The 
number of driver mutations also increased independently with age, 
with every decade of life increasing the number of drivers per cell by 
1.5-fold (95% Cl, 1.2-2.1; P=0.004)—a pattern reminiscent of the increas- 
ing number of driver mutations with age in the oesophagus’. Finally, 
the number of driver mutations doubled on average for every 5,000 
extra Somatic mutations per cell, independent of the other variables 
(95% CI, 1.4-2.7; P= 0.0003). 

Layering driver mutations onto phylogenetic trees revealed that 
driver mutations occurred throughout molecular time (Fig. 3a, 
Extended Data Fig. 6). Mutations in 7P53 were much more likely to be 
shared by two or more sequenced cells (Fig. 3e), however, suggesting 
that they either occur earlier in molecular time or drive larger clonal 
expansions. 


(slopes were estimated using LME models). The difference in slopes according 
tosmoking status is highly significant (P=0.0009 for interaction term; LME 
models). One outlying cell from an ex-smoker, which had more than 10,000 
mutations, was excluded from the plot to improve visualization. 


Telomere lengths 


To assess historic mitotic activity, we estimated telomere lengths 
from the sequencing data (Fig. 4). Bronchial cells from children had 
longer telomeres than did cells from adults (Extended Data Fig. 10), as 
expected, and telomere length showed no correlation with mutational 
burden in children. Among never-smokers, there was also minimal 
correlation between mutational burden and telomere length. In cur- 
rent smokers, however—and especially in ex-smokers—there was a 
strong inverse relationship between telomere length and mutational 
burden, independent of the number of driver mutations (P=0.0009 for 
interaction between smoking status and telomere length by LME mod- 
els; Supplementary Code). In particular, the cells with a near-normal 
mutational burden in ex-smokers had considerably longer telomeres 
than did their more-mutated counterparts, suggesting that they have 
historically undergone fewer cell divisions. 


Discussion 


The simplicity of the notion that cigarette smoking causes lung cancer 
through its mutagenic effects belies the underlying complexity of how 
tobacco shapes clonal dynamics, mutation acquisition and the selec- 
tive environment in the bronchus. As expected, exposure to tobacco 
smoke increases the number of somatic mutations (by an average of a 
few thousand mutations per normal bronchial cell); the excess muta- 
tions are attributable to signatures of carcinogens in cigarette smoke; 
and theincreased mutational burden generates more driver mutations. 
What is unexpected, however, is the pronounced within-patient varia- 
tionin mutational burden among smokers: cells from one small biopsy 
of bronchial epithelium can vary tenfold in their mutational burden, 
from 1,000 to over 10,000 mutations per cell. 

Our cohort may be affected by recruitment bias, as samples could 
only ethically be obtained from individuals who underwent a clini- 
cally indicated bronchoscopy. Nonetheless, such a recruitment bias 
cannot explain the considerable within-patient variance in mutational 
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burden, and this finding probably therefore applies to smokers more 
generally. Understanding how heterogeneity in mutational burden 
among competing cells contributes to clonal evolution will be impor- 
tant for refining our models of lung cancer development, which usu- 
ally assume that the effects of carcinogens are homogeneous across 
a population of cells. We recently described similar heterogeneity in 
tobacco-induced mutagenesis among neighbouring clones within non- 
malignant liver, suggesting that this phenomenonis not restricted to 
bronchial epithelium. 

We found that a qualitatively distinct population of bronchial epithe- 
lial cells with a near-normal mutational burden exists in subjects with 
ahistory of smoking. These cells have the same mutational burden as 
age-matched never-smokers; have low proportions of signatures from 
tobacco carcinogens and longer telomeres than more-mutated cells; 
and occur at a fourfold higher frequency in ex-smokers compared to 
current smokers. These cells are clearly protective against cancer—lung 
cancers that emerge in ex-smokers do not have a near-normal muta- 
tional burden, instead typically showing the high mutational burden 
that is associated with tobacco-induced signatures. 

Two points remain unclear: how these cells have avoided the high rates 
of mutations that are exhibited by neighbouring cells, and why this par- 
ticular population of cells expands after smoking cessation. The longer 
telomeres of these cells imply that cells with a near-normal burden have 
undergone fewer cell divisions, and therefore potentially represent recent 
descendants of quiescent stem cells. Although they remain elusive in 
human lung”, quiescent stem cells have been identified through lineage 
tracing in mouse models, and have been shown to occupy a protected 
niche in submucosal glands and expand after lung injury? >. A physi- 
cally protected niche could explain how such stem cells avoid exposure 
to tobacco carcinogens, but so too could mitotic quiescence itself, as 
replication is required to convert adducted DNA bases to mutations. 

It is tempting to assume that the expansion of cells with a near- 
normal burden after smoking cessation arises through better fitness 
in the altered selection landscape—perhaps because these cells have 
longer telomeres or fewer mutations, or because aberrant NOTCH 
or TP53 signalling confers less advantage in the absence of tobacco 
smoke. These explanations notwithstanding, the apparent expansion 
of the near-normal cells could represent the expected physiology of 
a two-compartment model in which relatively short-lived prolifera- 
tive progenitors are slowly replenished froma pool of quiescent stem 
cells, but the progenitors are more exposed to tobacco carcinogens. 
Only in ex-smokers would the difference in mutagenic environment be 
sufficient to distinguish newly produced progenitors from long-term 
occupants of the bronchial epithelial surface. 

Epidemiological studies show that the health benefits of stopping 
smoking begin immediately, accrue with time since cessation and are 
evident even after quitting late in life’. That these benefits could be 
facilitated by replenishment of the bronchial epithelium with cells that 
are essentially impervious to decades of sustained cigarette smoking 
attests to the resilience and regenerative capacity of the lungs. The mes- 
sage for public health is that stopping smoking—at any age—does not 
just slow the accumulation of further damage, but can also reawaken 
cells that have not been damaged by past lifestyle choices. 


Online content 


Any methods, additional references, Nature Research reporting sum- 
maries, source data, extended data, supplementary information, 
acknowledgements, peer review information; details of author con- 
tributions and competing interests; and statements of data and code 
availability are available at https://doi.org/10.1038/s41586-020-1961-1. 


272 | Nature | Vol578 | 13 February 2020 


A: Alberg, A. J., Brock, M. V., Ford, J. G., Samet, J. M. & Spivack, S. D. Epidemiology of lung 
cancer. Diagnosis and management of lung cancer, 3rd ed: American College of Chest 

Physicians evidence-based clinical practice guidelines. Chest 143, e1S-e29S (2013). 

2. Peto, R. et al. Smoking, smoking cessation, and lung cancer in the UK since 1950: 

combination of national statistics with two case-control studies. Br. Med. J. 321, 323-329 

(2000). 

3. International Agency for Research on Cancer. Tobacco Smoke and Involuntary Smoking. 

IARC Monographs on the Evaluation of Carcinogenic Risks to Humans Vol. 83 (IARC and 

World Health Organization, 2004). 

4. Hecht, S. S. Progress and challenges in selected areas of tobacco carcinogenesis. Chem. 

Res. Toxicol. 21, 160-171 (2008). 

5. Pfeifer, G. P. et al. Tobacco smoke carcinogens, DNA damage and p53 mutations in 

smoking-associated cancers. Oncogene 21, 7435-7451 (2002). 

6.  Pleasance, E. D. et al. A small-cell lung cancer genome with complex signatures of 

tobacco exposure. Nature 463, 184-190 (2010). 

de mielinski, M. et al. Mapping the hallmarks of lung adenocarcinoma with massively 

parallel sequencing. Cell 150, 1107-1120 (2012). 

8. Alexandrov, L. B. et al. Mutational signatures associated with tobacco smoking in human 
cancer. Science 354, 618-622 (2016). 

9. George, J. et al. Comprehensive genomic profiles of small cell lung cancer. Nature 524, 
47-53 (2015). 

10. Jamal-Hanjani, M. et al. Tracking the evolution of non-small-cell lung cancer. N. Engl. J. 
Med. 376, 2109-2121 (2017). 

11. Tomasetti, C., Marchionni, L., Nowak, M. A., Parmigiani, G. & Vogelstein, B. Only three 
driver gene mutations are required for the development of lung and colorectal cancers. 
Proc. Natl Acad. Sci. USA 112, 118-123 (2015). 

12. Martincorena, |. et al. Universal patterns of selection in cancer and somatic tissues. Cell 
171, 1029-1041.e21 (2017). 

13. Campbell, J. D. et al. Distinct patterns of somatic genome alterations in lung 

adenocarcinomas and squamous cell carcinomas. Nat. Genet. 48, 607-616 (2016). 

4. Garfinkel, L. & Stellman, S. D. Smoking and lung cancer in women: findings ina 

prospective study. Cancer Res. 48, 6951-6955 (1988). 

5. Armitage, P. Response to Richard Doll: the age distribution of cancer. J. Roy. Stat. Soc. A 

134, 155-156 (1971). 

6. Doll, R. & Peto, R. Cigarette smoking and bronchial carcinoma: dose and time 

relationships among regular smokers and lifelong non-smokers. J. Epidemiol. Community 

Health 32, 303-313 (1978). 

7. Lee, J. J.-K. et al. Tracing oncogene rearrangements in the mutational history of lung 

adenocarcinoma. Cell 177, 1842-1857 (2019). 

8. Cancer Genome Atlas Research Network. Comprehensive genomic characterization of 

squamous cell lung cancers. Nature 489, 519-525 (2012). 

9. Kucab, J. E. et al. A compendium of mutational signatures of environmental agents. Cell 

177, 821-836 (2019). 

20. Blokzijl, F., Janssen, R., van Boxtel, R. & Cuppen, E. MutationalPatterns: comprehensive 
genome-wide analysis of mutational processes. Genome Med. 10, 33 (2018). 

21. Petljak, M. et al. Characterizing mutational signatures in human cancer cell lines reveals 
episodic APOBEC mutagenesis. Cell 176, 1282-1294 (2019). 

22. Haradhvala, N. J. et al. Mutational strand asymmetries in cancer genomes reveal 
mechanisms of DNA damage and repair. Cell 164, 538-549 (2016). 

23. Letouzé, E. et al. Mutational signatures reveal the dynamic interplay of risk factors and 
cellular processes during liver tumorigenesis. Nat. Commun. 8, 1315 (2017). 

24. Alexandrov, L. et al. The repertoire of mutational signatures in human cancer. Nature 
https://doi.org/10.1038/s41586-020-1943-3 (2020). 

25. Cancer Genome Atlas Research Network. Comprehensive molecular profiling of lung 
adenocarcinoma. Nature 511, 543-550 (2014). 

26. Lawrence, M.S. et al. Discovery and saturation analysis of cancer genes across 21 tumour 
types. Nature 505, 495-501 (2014). 

27. Martincorena, |. et al. High burden and pervasive positive selection of somatic mutations 
in normal human skin. Science 348, 880-886 (2015). 

28. Martincorena, |. et al. Somatic mutant clones colonize the human esophagus with age. 
Science 362, 911-917 (2018). 

29. Yokoyama, A. et al. Age-related remodelling of oesophageal epithelia by mutated cancer 
drivers. Nature 565, 312-317 (2019). 

30. Yizhak, K. et al. RNA sequence analysis reveals macroscopic somatic clonal expansion 
across normal tissues. Science 364, eaaw0726 (2019). 

31. Brunner, S. F. et al. Somatic mutations and clonal dynamics in healthy and cirrhotic 
human liver. Nature 574, 538-542 (2019). 

32. Teixeira, V. H. et al. Stochastic homeostasis in human airway epithelium is achieved by 
neutral competition of basal cell progenitors. eLife 2, eOO966 (2013). 

33. Hegab, A. E. et al. Novel stem/progenitor cell population from murine tracheal 
submucosal gland ducts with multipotent regenerative potential. Stem Cells 29, 
1283-1293 (2011). 

34. Tata, A. et al. Myoepithelial cells of submucosal glands can function as reserve stem cells 
to regenerate airways after injury. Cell Stem Cell 22, 668-683 (2018). 

35. Lynch, T. J. et al. Submucosal gland myoepithelial cells are reserve stem cells that can 
regenerate mouse tracheal epithelium. Cell Stem Cell 22, 653-667 (2018). 


Publisher's note Springer Nature remains neutral with regard to jurisdictional claims in 
published maps and institutional affiliations. 


© The Author(s), under exclusive licence to Springer Nature Limited 2020 


Methods 


Data reporting 

No statistical methods were used to predetermine sample size. The 
experiments were not randomized and the investigators were not 
blinded to allocation during experiments and outcome assessment. 


Subjects 

Subjects were recruited at University College London Hospitals (UCLH) 
or Great Ormond Street Hospital (GOSH) and gave written informed 
consent with approval of the Research Ethics Committee (REC refer- 
ence 06/Q0505/12 and 11/LO/152, respectively). Details of the patients 
studied are listed in Supplementary Table 1. All patients underwent 
bronchoscopy as part of their clinical care. In adults, the bronchoscopy 
procedure was performed for diagnostic or surveillance indications; 
in children, it was undertaken for investigational procedures on con- 
genital tracheal abnormalities. For five patients with squamous cell 
carcinomas or carcinoma in situ, biopsy of normal bronchial tissue 
was taken from a site distant from the tumour. 


Single-cell-derived colonies 
Endobronchial biopsies were dissociated using 16 U/ml dispase in RPMI 
for 20 min at room temperature. The epithelium was dissected away 
from the underlying stroma and fetal bovine serum (FBS) was added 
to a final concentration of 10%. Both the epithelium and stroma were 
combined and digested in 0.1% trypsin/EDTA at 37 °C for 30 min. The 
solution was neutralized with FBS to a final concentration of 10% and 
added to the neutralized dispase solution®. Cells were passed through 
a100-um cell strainer and stained in sorting buffer (1x PBS, 1% FBS, 
25 mM HEPES and 1 mM EDTA) with anti-CD45-PE (BD Pharminogen 
555483, 1:200), anti-CD31-PE (BD Pharminogen 555446, 1:200), anti- 
EPCAM-APC (Biolegend 324208, 1:50) antibodies and DAPI (1pg/ml). For 
endobronchial brushings, no dissociation was carried out and the cell 
suspension was passed through a100-ym cell strainer before staining. 
Cells were single-cell sorted on the basis of their expression 
of CD45, CD31 and EpCAM, using a BD FACSAria Fusion. Each 
DAPI-CD45°CD31 EpCAM* cell was sorted into 1 well of a 96-well plate, 
pre-coated with collagen I and mitotically inactivated 3T3-J2 feeder 
cells. Feeder cells were authenticated by whole-genome sequencing, 
and were screened for mycoplasma contamination by PCR. Cells were 
grown in fresh epithelial growth medium” (Dulbecco's modified Eagle 
medium (DMEM):F12 at a 3:1 ratio with penicillin-streptomycin, 5% FBS, 
5 uM Y-27632, 5 pg/ml insulin, 25 ng/ml hydrocortisone, 0.125 ng/ml 
epidermal growth factor, 0.1nM cholera toxin, 250 ng/ml amphotericin 
Band 10 pg/ml gentamicin), which was supplemented for the first week 
of culture with epithelial growth medium that had been conditioned 
on growing epithelial cells and a final concentration of 10 uM Y-27632. 
Epithelial cells were grown in 96-well plates for 2 weeks before being 
passaged into 24-well plates and then into T25 flasks. Epithelial cells 
were in culture for a total of about 25 days at 37 °C and 5% CO, with3 
changes of medium per week. When cells reached 70-80% confluence 
in T25 flasks, they were differentially trypsinized (making use of the 
greater sensitivity of feeder cells to trypsin compared with epithelial 
cells), generating a mostly pure population of epithelial cells. DNA was 
then extracted using the PureLink Genomic DNA Mini Kit (Invitrogen). 


Whole-genome sequencing 

Paired-end sequencing reads (150 bp) were generated using the Illumina 
Hiseq X-Ten platform for 662 samples from 16 patients. The target cov- 
erage was 15x per sample, except for 30x for 26 pilot samples that were 
derived from the first patient (PD26988). For ten patients, blood DNA 
samples were also sequenced as germline controls. For three patients, 
samples of bulk squamous cell carcinoma or carcinoma in situ, which 
were collected at the same or close time points (around four months 
after), were sequenced, including two samples of carcinoma in situ 


that were used in a previous study*® (PD38326a and PD38327a, which 
are carcinomas in situ that were derived from PD30160 and PD34210, 
respectively). We also sequenced the whole genome of the pure mouse 
feeder cell layer. 


Discrimination of human and mouse sequences 

Bronchial epithelium samples were cultured on J2 mouse embryonic 
feeder fibroblast cells, which caused various degrees of contamination 
of mouse DNA in the samples from bronchial cell colonies. To remove 
mouse-derived sequencing reads, we used the Xenome algorithm” 
with default setting (k-mer size =25). The Xenome algorithm classifies 
fastq files into five categories: graft (human), host (mouse), ambiguous, 
both and neither. We confirmed that most of the sequencing reads ofa 
sample of pure human DNA were classified as human (98%) and those 
of asample of DNA derived from mouse feeder cells were rarely (2.8%) 
classified as human (Extended Data Fig. 2a). In addition, we mapped 
sequencing reads of a DNA sample from mouse feeder fibroblasts to 
the human reference genome, and confirmed that most of the mouse- 
derived mutations had been successfully removed using Xenome for 
selected samples with mouse contamination (Extended Data Fig. 2b). 
Although all samples were negative for mycoplasma using standard 
laboratory PCR testing, Xenome identified sequencing reads derived 
from the mycoplasma genome in a subset of samples, and assigned 
them to the ‘neither’ classification. 

With testing complete, we ran Xenome for all bronchial epithelium 
samples, and aligned only reads that were classified as human to the 
human reference genome (NCBI build 37d5) using the BWA-MEM algo- 
rithm. The metrics of sequencing coverage and proportion of human- 
derived reads are listed in Supplementary Table 2, and 20 samples 
with an average sequencing depth of less than 8x were excluded from 
further analysis owing to lower estimated sensitivity, as described later 
(Extended Data Fig. 2e). 


Clonality of samples 

To ensure that each sample was single-cell-derived, we visually 
inspected the distribution of VAFs of mutations: 632 clones had VAFs 
distributed around 50%, confirming that they were derived froma sin- 
gle cell, but 10 clones had lower allele fractions, suggesting that these 
colonies were oligoclonal (Extended Data Fig. 2d). These samples were 
removed from further analyses (Supplementary Table 2). 


Single-base-substitution calling 

Single-base substitutions were called using the Cancer Variants through 
Expectation Maximization (CaVEMan) algorithm*° with copy-number 
options of major copy number 5, minor copy number 2and normal con- 
tamination 0.1. To allow the discovery of early embryonic mutations, 
we ran CaVEMan using an unmatched normal control. In addition tothe 
default ‘PASS’ filter, we removed variants with a median alignment score 
(ASMD) < 120 and those with a clipping index (CLPM) > 0, to remove 
mapping artefacts. Variants identified in the mouse feeder fibroblast 
DNA sample were also removed, if they persisted in the call-set. Subse- 
quently, for every mutation identified in any colonies from each patient, 
we counted the number of mutant and wild-type reads in all bronchial 
samples from the same patient using the bam2R function of the R pack- 
age deepSNV“, for which bases with 230 base quality and sequencing 
reads with >30 mapping quality were used. Further filters described 
below were applied to identify true somatic mutations and separate 
them from either germline variants or recurrent sequencing errors. 


Removing germline variants (binomial filter) 

We fitted a binomial distribution to the total variant counts and total 
depth at each single-base substitution site across all samples from one 
patient. To differentiate somatic variants from germline variants, we 
used a one-sided exact binomial test, with the null hypothesis that 
these variants were drawn froma binomial distribution with a success 
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probability of 0.5 (0.95 for sex chromosomes in males). The alternative 
hypothesis was that these variants were drawn from distributions with 
lower success probabilities. Variants with Pvalue > 107° were considered 
as germline variants. 


Removing errors (beta-binomial filter) 

We fitted a beta-binomial distribution to the variant counts and depths 
of all single-base substitutions across samples from the same patient 
for the remaining somatic variants. The beta-binomial was used as 
it captures the difference between artefactual variant sites and true 
somatic variants. Many artefacts appear to be randomly distributed 
across samples and can be modelled as drawn from a binomial dis- 
tribution. True somatic variants will be present at a high VAF in some 
samples, but absent in others, and are hence best captured by a highly 
overdispersed beta-binomial. For each variant site, the maximum likeli- 
hood of the overdispersion factor (p) was calculated using a grid-based 
method (ranging from a value of 10° to 10). Variants with p > 0.1 
were filtered out and considered to be artefactual. The code for this 
filter is based on the Shearwater variant caller. 


Removing mutations that were induced in vitro 

We observed peaks of lower VAFs in a subset of samples (Extended 
Data Fig. 2c), which suggested that mutations were present that had 
arisen during the in vitro expansion of the single cell. These peaks were 
more prominent in samples from children, suggesting that the number 
of this kind of mutation is relatively small. They would, however, be 
more prominent in samples with a low true mutational burden, such 
as in children. We discarded mutations with a median VAF < 0.3 for 
autosomal regions and median VAF < 0.6 for sex chromosomes across 
allsamples from the same patient; these cut-offs were determined on 
the basis of the observed distribution of VAFs here and in a previous 
report”°. 

We quantified sensitivity by measuring how well our algorithms 
called heterozygous germline polymorphisms in the colonies depend- 
ing on sequencing depth; as our colonies are single-cell-derived, we 
would expect heterozygous germline single-nucleotide polymor- 
phisms to have the same VAF distribution as true somatic mutations 
in that original single cell. We find that a sequencing depth of 8x leads 
to an estimated sensitivity of 70-75%, increasing to more than 95% 
at a sequencing depth of 15x. The majority of the colonies that we 
sequenced had depths of greater than 15x, and we set aminimum cut- 
off depth of 8x for inclusion of a colony within the study (Extended 
Data Fig. 2e). Finally, we visually inspected allelic counts of removed 
germline variants with two or more samples without any mutant reads, 
and rescued embryonic mutations. Somatic variants were annotated 
using ANNOVAR”. 


Indel calling 

Indels were called using cgpPindel®’, and an unmatched normal sample 
was used as the germline control. Indels that were detected in mouse 
fibroblast feeder cells were removed as mouse-derived artefacts. For 
allindels, indel-positive or -negative sequencing reads were counted 
using cgpVAF across all samples from each patient. 

To remove germline variants and recurrent sequencing errors, the 
same binomial and beta-binomial filters were used as described above 
for single-base substitutions. We discarded mutations with a median 
VAF < 0.25 for autosomal regions and median VAF < 0.5 for sex chromo- 
somes across all samples from the same patient to remove mutations 
that were induced in vitro. 
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Double-base-substitution calling 

We first identified candidate double-base substitutions based on side- 
by-side single-base substitutions that were called using CaVEMan for 
each patient, and ran cgpVAF across all samples from each patient to 
remove those called in independent reads. Double-base substitutions 


with three or more mutant reads in at least one sample were considered 
as true positives. Germline variants, errors and mutations induced 
in vitro were filtered as for single-base substitutions and indels. 


Structural-variant calling 

Structural variants were called using the BRASS algorithm“, and 
matched normal samples (including blood samples and normal 
bronchial samples that were assigned on distantly located branches 
in phylogenetic trees) were used as controls. To remove germline 
structural variants, we filtered structural variants that were detected 
in the descendant colonies of both of the earliest two branches at the 
top of the phylogenetic tree for each patient. If the earliest branch had 
three or more branches (polytomy), those detected in both descendant 
and non-descendant samples of the earliest branch with the highest 
number were removed. We further filtered structural variants that 
were not identified using an unmatched normal control, to remove 
structural variants that were not filtered owing to the lower sequencing 
coverage of the matched normal control sample. Structural variants 
that were detected in other patients were also removed as germline 
variants or errors. Finally, all remaining structural-variant calls were 
manually inspected using the Integrated Genomics Viewer (IGV) to 
confirm somatic variants. 


Copy-number calling 

Copy-number changes were called using the ASCAT algorithm*®”®, and 
the same matched normal control samples as those used in the struc- 
tural-variant analysis were used as germline controls. Copy-number 
gains, losses and copy-neutral LOHs were visually confirmed using 
LogR and BAF plots in ascatNgs. For amplification, those copy-number 
changes that were greater than 100 kb were visually confirmed using 
ascatNgs and JBrowse™. 
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Mutational burden and estimation of the effects of age and 
smoking 

For single-base substitutions, indels and double-base substitutions, 
samples with three or more mutant reads and a VAF of 0.2 or higher 
were considered to be mutated, and the number of each class of genetic 
lesions was counted for all bronchial cells. For structural variants, chro- 
moplexy* (Extended Data Fig. 4c), chromothripsis*” (Extended Data 
Fig. 4d) and translocation pairs with similar breakpoints were consid- 
ered as single structural variants. Genetic lesions that were identified 
both as structural variants and copy-number changes were also con- 
sidered as single events. 

AnLME model was then fitted to estimate the effects of age and smok- 
ing status on the number of single-base substitutions or indels using 
the nlme package in R (Supplementary Code). In addition to the fixed 
effects of age and smoking, patient was used as a grouping variable in 
the random effect, in which smoking status was used as a modifier of 
between-patient difference. Difference of within-group heterogene- 
ity (heteroscedasticity) according to smoking status was also fitted 
in this model. The intercept of this model was probably derived from 
embryonic mutations and mutations that were introduced in vitro. 
Models were fitted using maximum likelihood estimation, and nested 
models were compared using likelihood ratio tests. 


Identification of near-normal lung cells 

We define cells as having a near-normal mutational burden if they have 
a mutational burden that is less than two non-smoker within-patient 
standard deviations plus two non-smoker between-patient standard 
deviations above the estimated number of mutations accumulated at 
the age of that patient using an LME model (Supplementary Code). The 
fraction of cells with a near-normal mutational burden was compared 
between current smokers and ex-smokers with log-linear regression 
using the logarithm of the total number of cells sequenced per patient 
as an offset. 


Construction of phylogenetic trees 
Phylogenetic trees were built with maximum parsimony using sub- 
stitutions for each patient. First, the input matrix of mutations was 
made, in which samples with a VAF of 0.2 or higher and three or more 
mutant reads were considered as mutated samples and labelled as ‘1’, 
and remaining samples were labelled as ‘0’. Among samples labelled as 
0, samples with (i) a sequencing depth of 6x or less for each mutated 
base and (ii) one or more mutant reads were considered as undeter- 
mined and labelled as ‘?’. For every individual, phylogenetic trees were 
constructed using the Camin-Sokal method of the Mix program of 
the RPhylip package, and consensus trees of all the trees were then 
constructed using the Consensus program in RPhylip. 
Subsequently, all mutations were reassigned to branches in the 
phylogenetic trees. If mutations were called in all the descendants 
of agiven branch and in no samples that were not descendants of the 
branch, mutations were perfectly assigned to those branches. Given 
the existence of samples with relatively lower sequencing depths for 
each mutated position, we also assigned mutations to branches if muta- 
tions were called in all but one undetermined descendant labelled as 
? of agiven branch, and all samples that were not descendants of the 
branch were wild-type (0). Given the smaller number of indels and 
double-base substitutions, these were assigned to each branch using 
the tree defined from single-base substitutions, rather than generating 
newtrees for the other mutation types. 


Extraction of SBS signatures 

To analyse mutational signatures for single-base substitutions, those 
assigned to each branch of the phylogenetic trees were categorized 
into 288 subtypes, consisting of 6 mutation classes by 16 5’- and 
3’-base contexts on the transcribed strand, non-transcribed strand 
or intergenic region. Mutational signatures were extracted using the 
HDP package™, relying on the hierarchical Bayesian Dirichlet pro- 
cess (https://github.com/nicolaroberts/hdp). Owing to the lack of 
reference signatures categorized into 288 subtypes, we conducted a 
de novo signature extraction. We included somatic mutations from 
squamous cell lung carcinomas sequenced by The Cancer Genome 
Atlas (TCGA) and from in vitro single-cell culture controls as separate 
samples to maintain comparability with signatures that have already 
been established in previous studies. For identified SBS signatures, 
signatures with >0.90 cosine similarity with reported signatures, in 
terms of distribution to 96 or 192 subtypes”, were considered as the 
same signatures, including SBS-1, SBS-4, SBS-5, SBS-16 and SBS-18. 
For the remaining new signatures, the expectation-maximization 
algorithm was used to deconvolute these signatures into the five sig- 
natures above and other known signatures in lung cancers (SBS-2, 
SBS-8 and SBS-13), because it is difficult to separate signatures that 
are strongly correlated across samples. If a signature reconstituted 
from the components that expectation maximization extracted (only 
including signatures that accounted for at least 10% of mutations in 
each sample to avoid overfitting) had a>0.90 cosine similarity to the 
original HDP signature, the signature was presented as its expectation- 
maximization deconvolution. Two HDP signatures met these criteria: 
one new signature was deconvoluted into a mixture of SBS-4 and SBS- 
5, and another new signature was deconvoluted in SBS-2 and SBS-13. 
After these analyses, seven known and two new SBS signatures were 
identified. 

To validate these signatures that were identified using the HDP, 
we also analysed SBS signatures using the MutationalPatterns pack- 
age”, which relies on non-negative matrix factorization. The optimal 
factorization rank (7) was determined on the basis of the slope of 
the cophenetic correlation coefficient. MutationalPatterns identi- 
fied similar signatures to SBS-5 (Signature A), SBS-4 (Signature B), 
Sig-B (Signature D), SBS-18 (Signature E), SBS-1 (Signature F), SBS-2 
and SBS-13 (Signature G) (Extended Data Fig. Sa, b). 


Extraction of indel and DBS signatures 

For indels and double-base substitutions, each type of genetic altera- 
tion that was assigned to each branch of the phylogenetic trees was 
categorized into 83 and 78 subtypes, as previously reported”. First, 
the algorithm was conditioned on the set of mutational signatures that 
have been detected in lung cancers (ID-1, ID-2, ID-3, ID-5, ID-6, ID-8, ID-9, 
DBS-2, DBS-4, DBS-5, DBS-6, DBS-11). This allows simultaneous discov- 
ery of known and new signatures. For known signatures, signatures 
identified by HDP with a cosine similarity >0.90 with corresponding 
reported signatures were accepted as known signatures. Deconvolution 
of new signatures to the above known signatures was also performed, 
and one newindel signature was deconvoluted inID-5 and ID-8. Finally, 
ten known signatures and one new signature were identified. 


Analysis of A>G transcriptional strand bias 

First, we measured the distance from mutations to the nearest transcrip- 
tion start sites (TSSs) of all the expressed genes in the lung; expressed 
genes were defined as those with a median of one or more transcripts 
per million in lung samples in the GTEx database (https://gtexportal. 
org/home/). Mutations in regions of bidirectional transcription were 
excluded from further analysis. We tiled 10 kb up and downstream of 
the TSSs into 1-kb bins, and counted the number of A>G mutations on 
transcribed and untranscribed regions in each tile. This number was 
further divided by the average number of bins in intergenic regions. 


Analysis of driver variants 

To systematically identify genes under positive selection in normal 
bronchial epithelium, we used the dN/dS method”. We performed 
exome-wide dN/dS analysis and also analysed global dN/dS ratios for 
driver genes (n = 86) reported in lung cancer’*”*"8° or normal skin 
or oesophagus tissues” ” using dNdScv (Supplementary Table 3). 
Genes with q value < 0.05 were reported as driver genes (Supplementary 
Tables 4, 5). Finally, hot-spot mutations reported in COSMIC for four 
or more patients were also considered as driver mutations, inaddition 
to those in the seven driver genes identified by dNdScv (Fig. 3b). The 
proportion of shared mutations (found in more than one colony) and 
private mutations (found ina single colony) was calculated for patients 
other than PD30160 (who had a low number of sequenced samples 
(n=13)). For known lung cancer driver genes, the distributions of muta- 
tions were compared between bronchial cells and lung squamous cell 
carcinoma” (Extended Data Fig. 9b). 

To estimate the effect of smoking status on the number of driver 
mutations, a generalized LME model was fitted using the Ime4 package 
in R (Supplementary Code). Patient was modelled as arandom effect, 
and the fixed effects of age, smoking status and total mutational burden 
were fitted into the model. 


Estimation of telomere length 

The average telomere lengths of bronchial epithelium cells were esti- 
mated from the whole-genome sequencing data using Telomerecat™. 
Considering the similarity of telomere sequences between human and 
mouse, wealigned all sequencing reads to the human reference genome 
using BWA-MEM without using Xenome, and then ran Telomerecat on 
the bam files. Samples with reported mouse contamination of more 
than 10% were excluded from further analysis to prevent a possible 
effect of mouse cells on telomere length. The average telomere length 
for the mouse fibroblast feeder samples was estimated at 1,745 bp, 
which is within the range of estimates of human telomere length, so 
a low level of mouse contamination will not substantially affect the 
estimates. 

AnLME model was then fitted to estimate the effect of telomere 
length on the number of single-base substitutions using the Ime4 pack- 
age in R (Supplementary Code). Patient was modelled as arandom 
effect, and the fixed effects of telomere length andits interaction with 
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smoking status, as well as the fixed effects of age and smoking status, 
were fitted into the model. 


Reporting summary 
Further information on research design is available in the Nature 
Research Reporting Summary linked to this paper. 


Data availability 


Sequencing data have been deposited at the European Genome-phe- 
nome Archive (http://www.ebi.ac.uk/ega/) under the accession number 
EGAD00001005193. Somatic-mutation calls, including single-base 
substitutions, indels and structural variants, from all 632 samples 
have been deposited on Mendeley Data with the identifier: https:// 
doi.org/10.17632/b53h2kwpyy.2. 


Code availability 


Detailed method and custom R scripts for the analysis of mutational 
burden in bronchial epithelium are available in Supplementary Code. 
Other packages used in the analysis are as follows: R v.3.5.1; BWA-MEM 
v.0.7.17-r1188 (https://sourceforge.net/projects/bio-bwa/); CaVEMan 
v.1.11.2 (https://github.com/cancerit/CaVEMan); Pindel v.2.2.5 (https:// 
github.com/cancerit/cgpPindel); Brass v.6.1.2 (https://github.com/ 
cancerit/BRASS); ASCAT NGS v. 4.1.2 (https://github.com/cancerit/ 
ascatNgs); Xenome (https://github.com/data61/gossamer/blob/ 
master/docs/xenome.md); deepSNV v.1.28.0 (https://bioconductor. 
org/packages/release/bioc/html/deepSNV.html); ANNOVAR (http:// 
wannovar.wglab.org/); IGV (http://software.broadinstitute.org/soft- 
ware/igv/); JBrowse (https://jbrowse.org/); cgpVAF (https://github. 
com/cancerit/vafCorrect); RPhylip v.0.1.23 (http://www.phytools.org/ 
Rphylip/); hdp v.0.1.5 (https://github.com/nicolaroberts/hdp); Muta- 
tionalPatterns v.1.8.0 (https://bioconductor.org/packages/release/ 
bioc/html/MutationalPatterns.html); dNdScv v.0.0.1 (https://github. 
com/im3sanger/dndscv); and Telomerecat v.3.1.2 (https://github.com/ 
jhrf/telomerecat). 
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Extended Data Fig. 1| See next page for caption. 
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Extended Data Fig. 1| Flow-sorting strategy of single basal bronchial 
epithelial cells. a, Sorting of E>DCAM*’ epithelial cells from human airway 
biopsies. Human haematopoietic and endothelial cells were stained with 
antibodies against CD45 and CD31, respectively. Within the population of cells 
negative for those markers, EDCAM-expressing cells were gated. Single, live 
(DAPI-negative) cells were flow-sorted from this population into individual 
wells of 96-well plates. b, Quantitative PCR (qPCR) analysis of cultures of 
clonally derived airway epithelial cells. Airway basal cells express integrin 
subunit a 6 (/7GA6), keratin 5 (KRTS), cadherin 1(CDH1) and TP63. Expression is 
shown in clonally derived cell cultures (n=13 from 3 donors, coloured blue, 


green and orange) compared to control bulk human bronchial epithelial cell 
cultures (HBECs) that were expanded in the same culture conditions and lung 
fibroblast cell cultures (lung fibs) that served as a negative control. The centre 
values and error bars indicate mean ands.e.m., respectively. Conditionsin 
which no expression was detected are shownas 0.c, Colony-forming efficiency 
of CD45°CD31 EPCAM ‘ cells after single-cell sorting from endobronchial 
biopsy samples (n=16). For one ex-smoker, EDCAM was not used to select cells 
and only CD45°CD3I cells were sorted; as expected, this was the patient with 
the lowest colony-forming efficiency. 
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Extended Data Fig. 2| Quality assurance of mutation calls. a, Stacked bar 
chart showing the proportion of reads attributed to the human genome, mouse 
genome, both, neither, or with ambiguous mapping for the pure mouse 
fibroblast feeder line (left) or a pure human sample (right), assessed with the 
Xenome pipeline. b, Clean-up of mutation calls using the Xenome pipeline for 
one of the samples that was more heavily contaminated by the mouse feeder 
layer. The Venn diagram on the left shows the overlap in mutation calls before 
and after removing non-human reads by Xenome. c, Histograms of VAFs for two 
representative colonies in the sample set. The plot on the left shows a tight 
distribution around 50%, as expected for acolony derived froma single cell 
without contamination. The plot onthe right shows a bimodal distribution with 
one peak at 50% (mutations present in the original basal cell) and asecond peak 
at around 25% (probably representing mutations that were acquired in vitro 


during colony expansion). These second peaks at less than 50% are more 
evident in colonies from children, owing to the low number of mutations inthe 
original basal cell. d, Histogram of VAFs for a colony seeded by more than one 
basal cell, leading to a peak at much less than 50%. e, Estimated sensitivity of 
mutation calling according to sequencing depth. Heterozygous germline 
polymorphisms were identified in each subject; for each colony sequenced, we 
calculated the fraction of these polymorphisms that was recalled by our 
algorithms. f, Comparison of mutational burden in normal bronchial epithelial 
cells that neighbour a carcinoma in situ (CIS) versus cells distant from the CIS in 
five patients. The box-and-whisker plots show the distribution of mutational 
burden per colony within each subject, with the boxes indicating median and 
interquartile range and the whiskers denoting the range. The overlaid points 
are the observed mutational burden of individual colonies. 
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Extended Data Fig. 3| See next page for caption. 
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Extended Data Fig. 3 | Colonies with a near-normal mutational burden. 

a, Density distribution of mutational burden in cells from ex-smokers (green) 
and current smokers (purple). The black vertical line shows the threshold for 
near-normal mutational burden derived for each patient. Thexaxisisona 
logarithmic scale. Note the frequently bimodal distribution of mutational 
burden, especially in the ex-smokers, with the modes separated at the 
threshold for near-normal mutational burden. b, Flowcytometric analysis of 
clones for expression of KRTS, EDCAM, ITGA6, podoplanin (PDPN), NGFRand 
CD45 or CD31. Lung fibroblasts are included as acomparison. Fluorescence 
minus one (FMO) is shown. Plots for one clone with a near-normal mutational 


burden (low-mutant clone) and one with an increased burden (high-mutant 
clone) are shown, and are representative of five clones from one patient. 

c, Bright-field images of expanded clones at passage 3, showing cobblestone 
epithelial morphology. Images are representative of five clones from one 
patient. A clone with an increased mutational burdenis shown at thetop, anda 
clone from an ex-smoker with a near-normal mutational burdenis shown at the 
bottom. For the left images, the magnification is x10 and the scale bar is 

200 um; for the right images, the magnification is x20 and the scale bar is 

100 pm. 
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Extended Data Fig. 4|Indels, copy-number changes and structural variants across the 16 subjects. c, Three examples of chromoplexy in normal bronchial 


in normal bronchial epithelial cells. a, Relationship of burden of indels per cells. Structural variants are shown as coloured arcs that join two positions in 
cell with age. The points represent individual colonies (n= 632) and are the genome around the circumference. The instances of chromoplexy all 
coloured by smoking status. The black line represents the fitted effect of age consist of three translocations (purple). d, An example of chromothripsisina 
onindel burden, which was estimated from LME models after correction for cell from an 11-month old child. The plot on the right shows the copy number of 
smoking status and within-patient correlation structure. The blue shaded area genomic windows in the relevant region of chromosome 1 (black points); the 
represents the 95% Cl for the fitted line. b, Stacked bar plot showing the lines and arcs denote the positions of observed structural variants. 
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Extended Data Fig. 5| See next page for caption. 
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Extended Data Fig. 5 | Comparison of mutational signatures that were 
extracted using two algorithms. a, Trinucleotide contexts for the signatures 
extracted by the hierarchical Dirichlet process (HDP) (left) and 
MutationalPatterns non-negative matrix factorization (right). The six 
substitution types are shownacross the top of each signature. Within each 
signature, the trinucleotide context is shown as four sets of four bars, grouped 
by whether anA, C, Gor T respectively is 5’ to the mutated base, and within each 
group of four by whether A, C, Gor T is 3’ to the mutated base (the order of bars 
is the same as that shown in Fig. 2b). Where signatures show high cosine 
similarity scores between algorithms, they are lined up horizontally. We note 
that Signature Cin MutationalPatterns does not have a match inthe signatures 
extracted by the HDP algorithm, but appears very similar to Signature A in 


MutationalPatterns (or SBS-5 from the HDP). This means that it probably 
represents over-splitting of the signatures. b, Heat map showing the cosine 
similarities of signatures extracted by MutationalPatterns with those 
extracted by the HDP. Only cosine-similarity scores that are greater than 0.75 
are coloured. c, Scatter plots showing the fraction of mutations in each colony 
(n= 632) assigned to each signature by the HDP algorithm (x axis) versus the 
MutationalPatterns algorithm (y axis). The correlation values quoted are 
Pearson’s correlation coefficients (R2). d, Transcriptional strand bias of A>G 
mutationsinanN[A]T context before and after TSSs. Note the absence of 
transcriptional strand bias in intergenic regions but evidence for both 
transcription-coupled damage and repair after the TSS, applying similarly in 
both never-smokers and ex- or current smokers. 
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Extended Data Fig. 6 | Phylogenetic trees of 13 subjects. Phylogenetic trees 
showing clonal relationships among normal bronchial cells in the 13 subjects 
not shown in Fig. 3a. Branch lengths are proportional to the number of 


mutations (x axis) specific to that clone or subclone. Each branch is coloured by 
the proportion of mutations on that branch that are attributed to the various 
SBS signatures. 
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Extended Data Fig. 7 | Indel signatures inthe sample set. a, Five indel 
signatures (ID-1, ID-2, ID-3, ID-5 and ID-8) were extracted by the HDP. The 
contributions of different types of indels to each signature are shown, grouped 
by whether variants are deletions or insertions; the size of the event; whether 
they occur at repeat units; and the sequence content of the indel. b, Stacked bar 


plot showing the proportional contribution of mutational signatures toindels 
across the 632 colonies derived from normal bronchial cells, extracted using 
the HDP. Within each patient, colonies are sorted from left to right by 
increasing indel burden (bar chart in dark grey above coloured signature- 
attribution stacks). 
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Extended Data Fig. 8 | DBS signatures inthe sample set. a, Six DBS signatures 
were extracted by the HDP. The contributions of different types of double-base 
substitution to each signature are shown, grouped by the sequence that is 
mutated and by what it is mutated to. Five of the signatures have been observed 
in cancer genomes”, and one (DBS Sig-C) is anovel signature that was extracted 
here. b, Stacked bar plot showing the proportional contribution of mutational 


signatures to double-base substitutions across the 632 normal bronchial cells, 
extracted using the HDP. Note that some of the colonies in children have no 
double-base substitutions. Within each patient, colonies are sorted from left to 
right by increasing burden of double-base substitutions (bar chart in dark grey 
above coloured signature-attribution stacks). 
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Extended Data Fig. 9 | Driver mutations in normal bronchial epithelium. 

a, Stick plots showing distribution of mutations in 7P53, NOTCH1 and other 
genes that were significantly mutated in our sample set. Mutations are 
coloured by type. The gene structure is shown horizontally in the centre of 
each plot, with domains as coloured bars. Above the gene are mutations in this 


sample set, and below the gene are mutations found in squamous cell 
carcinomas from the TCGA sample set. b, Fraction of cells with driver 
mutations in 7P53 (left), NOTCHI1 (middle) or all other significant cancer genes 
(right), split by smoking status. 
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Statistical parameters 


When statistical analyses are reported, confirm that the following items are present in the relevant location (e.g. figure legend, table legend, main 
text, or Methods section). 


n/a | Confirmed 


The exact sample size (n) for each experimental group/condition, given as a discrete number and unit of measurement 


An indication of whether measurements were taken from distinct samples or whether the same sample was measured repeatedly 


The statistical test(s) used AND whether they are one- or two-sided 
Only common tests should be described solely by name; describe more complex techniques in the Methods section. 


A description of all covariates tested 


A description of any assumptions or corrections, such as tests of normality and adjustment for multiple comparisons 


A full description of the statistics including central tendency (e.g. means) or other basic estimates (e.g. regression coefficient) AND 
variation (e.g. standard deviation) or associated estimates of uncertainty (e.g. confidence intervals) 


For null hypothesis testing, the test statistic (e.g. F, t, r) with confidence intervals, effect sizes, degrees of freedom and P value noted 
Give P values as exact values whenever suitable. 


For Bayesian analysis, information on the choice of priors and Markov chain Monte Carlo settings 


For hierarchical and complex designs, identification of the appropriate level for tests and full reporting of outcomes 


Estimates of effect sizes (e.g. Cohen's d, Pearson's r), indicating how they were calculated 


Clearly defined error bars 
State explicitly what error bars represent (e.g. SD, SE, Cl) 


Our web collection on statistics for biologists may be useful. 


Software and code 


Policy information about availability of computer code 


Data collection Image processing from sequencing data using standard Illumina X10 pipeline 


Data analysis Alignment and variant calling performed using Sanger Institute's custom pipeline. Single-nucleotide substitutions were called using the 
CaVEMan (cancer variants through expectation maximization) algorithm (https://github.com/cancerit/CaVEMan). Small insertions and 
deletions were called using the Pindel algorithm (https://github.com/genome/pindel). Rearrangements were called using the BRASS 
(breakpoint via assembly) algorithm (https://github.com/cancerit/BRASS). 


List of programs and softwares: 
R—version 3.5.1 
BWA-MEM - version 0.7.17-r1188 (https://sourceforge.net/projects/bio-bwa/) 

CaVEMan - version 1.11.2 

Pindel - version 2.2.5 

Brass - version 6.1.2 

ASCAT NGS - version 4.1.2 

Xenome (https://github.com/data61/gossamer/blob/master/docs/xenome.md) 

deepSNV - version 1.28.0 (https://bioconductor.org/packages/release/bioc/html/deepSNV.html) 
ANNOVAR (http://wannovar.wglab.org/) 

IGV (http://software.broadinstitute.org/software/igv/) 

JBrowse (https://jbrowse.org/) 


cgpVAF (https://github.com/cancerit/vafCorrect) 

RPhylip - version 0.1.23 (http://www.phytools.org/Rphylip/) 

hdp - version 0.1.5 (https://github.com/nicolaroberts/hdp) 

MutationalPatterns - version 1.8.0 (https://bioconductor.org/packages/release/bioc/html/MutationalPatterns.html) 
dNdScv - version 0.0.1 (https://github.com/im3sanger/dndscv) 

Telomerecat - version 3.1.2 (https://github.com/jhrf/telomerecat) 


For manuscripts utilizing custom algorithms or software that are central to the research but not yet described in published literature, software must be made available to editors/reviewers 
upon request. We strongly encourage code deposition in a community repository (e.g. GitHub). See the Nature Research guidelines for submitting code & software for further information. 
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All manuscripts must include a data availability statement. This statement should provide the following information, where applicable: 


- Accession codes, unique identifiers, or web links for publicly available datasets 
- A list of figures that have associated raw data 
- A description of any restrictions on data availability 


Sequence data that support the findings of this study have been deposited in the European Genome-Phenome Archive (https://www.ebi.ac.uk/ega/home) under 
accession number EGAD00001005193. Somatic mutation calls, including single base substitutions, indels and structural variants, from all 632 samples have been 
deposited on Mendeley Data with the identifier: http://dx.doi.org/10.17632/b53h2kwpyy.2. 


Field-specific reporting 


Please select the best fit for your research. If you are not sure, read the appropriate sections before making your selection. 


Life sciences Behavioural & social sciences Ecological, evolutionary & environmental sciences 


For a reference copy of the document with all sections, see nature.com/authors/policies/ReportingSummary-flat.pdf 


Life sciences study design 


All studies must disclose on these points even when the disclosure is negative. 


Sample size Sample size was chosen to give good representation of inter-patient and intra-patient variability in mutation burden. 


Data exclusions = Oligoclonal colonies or with low mean coverage (<8x) (Extended Figure 2e) were excluded due to the inaccuracy of mutation catalogues. One 
outlying cell in an ex-smoker with >10,000 mutations is excluded from the plot of Figure 4 to improve visualisation. 


Replication No experimental replication has yet been attempted. 
Randomization Not applicable - this is a descriptive study, not an intervention study. 


Blinding Not applicable - all dependent variables were computationally generated (mutation counts, signatures etc) and statistical analyses were 
prespecified. 
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Validation Antibodies were validated by the manufacturer. 


Eukaryotic cell lines 


Policy information about cell lines 


Cell line source(s) 3713-J2 feeder cells were kindly provided by Prof. Fiona Watt (King's College London) 
Authentication The feeder cells had whole genome sequencing performed, confirming their murine origin and clonal derivation. 
Mycoplasma contamination They were tested negative for Mycoplasma by PCR test (PMCID: PMC202165) 
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Human research participants 


Policy information about studies involving human research participants 


Population characteristics We analysed single cell-derived colonies from bronchial epithelium of 16 subjects, including 3 children, 4 never-smokers, 6 ex- 
smokers and 3 current smokers. Clinical characteristics of the cohort are described in Supplementary Table 1. Of the ex-smokers, 
2 had had a previous cancer treated with curative intent, and 5 had a carcinoma in situ or invasive squamous cell carcinoma.The 
children in the cohort had bronchoscopy for investigation or follow-up of congenital anomalies. 


Recruitment Recruited through University College Hospitals, London, UK. Our cohort does potentially suffer from recruitment bias, since 
samples could only ethically be obtained from individuals undergoing a clinically indicated bronchoscopy. 
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Methodology 

Sample preparation The epithelium was dissected away from the underlying stroma and fetal bovine serum (FBS) was added to a final concentration 
of 10%. Both the epithelium and stroma were combined and digested in 0.1% trypsin/EDTA at 37 C for 30 minutes. The solution 
was neutralised with FBS to a final concentration of 10% and added to the neutralised dispase solution1. Cells were passed 
through a 100 mcell strainer and stained in sorting buffer (1x PBS, 1% FBS, 25 mM HEPES and 1 mM EDTA) with anti-CD45-PE 


(BD Pharminogen 555483, 1:200), anti-CD31-PE (BD Pharminogen 555446, 1:200), anti-EPCAM-APC (Biolegend 324208, 1:50) 
antibodies and DAPI (1 ug/ml). 
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Synucleinopathies are neurodegenerative diseases that are associated with the 
misfolding and aggregation of a-synuclein, including Parkinson’s disease, dementia 
with Lewy bodies and multiple system atrophy’. Clinically, it is challenging to 
differentiate Parkinson’s disease and multiple system atrophy, especially at the early 
stages of disease’. Aggregates of a-synuclein in distinct synucleinopathies have been 
proposed to represent different conformational strains of a-synuclein that can self- 
propagate and spread from cell to cell* °. Protein misfolding cyclic amplification 
(PMCA) is a technique that has previously been used to detect a-synuclein aggregates 
in samples of cerebrospinal fluid with high sensitivity and specificity”®. Here we show 
that the a-synuclein-PMCA assay can discriminate between samples of cerebrospinal 
fluid from patients diagnosed with Parkinson’s disease and samples from patients 
with multiple system atrophy, with an overall sensitivity of 95.4%. We used a 
combination of biochemical, biophysical and biological methods to analyse the 
product of a-synuclein-PMCA, and found that the characteristics of the a-synuclein 
aggregates in the cerebrospinal fluid could be used to readily distinguish between 
Parkinson’s disease and multiple system atrophy. We also found that the properties of 
aggregates that were amplified from the cerebrospinal fluid were similar to those of 
aggregates that were amplified from the brain. These findings suggest that 
a-synuclein aggregates that are associated with Parkinson’s disease and multiple 
system atrophy correspond to different conformational strains of a-synuclein, which 
can be amplified and detected by a-synuclein-PMCA. Our results may help to improve 
our understanding of the mechanism of a-synuclein misfolding and the structures of 
the aggregates that are implicated in different synucleinopathies, and may also enable 
the development of a biochemical assay to discriminate between Parkinson’s disease 
and multiple system atrophy. 


The misfolding and aggregation of a-synuclein (a-syn) involves a 
mechanism of seeding and nucleation, in which initial seeds of a-syn 
recruit other soluble monomers that assemble to form aggregates”. 
Aggregates of a-syn circulate in biological fluids such as the cerebro- 
spinal fluid (CSF) and blood”. The process of protein misfolding and 
aggregation appears to begin years or decades before the onset of 
clinical signs, and thus detection of a-syn aggregates in easily acces- 
sible biological fluids may enable the biochemical diagnosis of synucle- 
inopathies. In previous studies, the PMCA technology has been adapted 
to enable highly sensitive and specific detection of a-syn aggregates 
that are produced in vitro®® or derived from the biological fluids of 


patients with synucleinopathies”®. The a-syn-PMCA assay (also referred 
to as a-syn-RT-QuIC"*) uses the seeding-nucleation mechanism to 
cyclically amplify the process of protein misfolding, enabling the effi- 
cient amplification of small quantities of a-syn oligomers and thereby 
facilitating their detection. 

In the a-syn-PMCA assay, the kinetics of aggregation of a-syn are 
monitored by the fluorescence signal of thioflavin T (ThT)—a dye that 
is specific to amyloid fibrils”. Previous studies have noted that the 
maximum fluorescence signal of the a-syn-PMCA product from reac- 
tions that were initiated with CSF from patients with multiple system 
atrophy (MSA) was smaller than the corresponding fluorescence signal 


‘Mitchell Center for Alzheimer’s Disease and Related Brain Disorders, Department of Neurology, University of Texas McGovern Medical School at Houston, Houston, TX, USA. Department of 
Microbiology and Molecular Genetics, University of Texas McGovern Medical School at Houston, Houston, TX, USA. “Department of Neurology, Mayo Clinic, Rochester, MN, USA. “Division of 
Hematology, Department of Internal Medicine, University of Texas McGovern Medical School at Houston, Houston, TX, USA. ‘Department of Physics, Chemistry and Biology, Linképing 
University, Link6ping, Sweden. “These authors contributed equally: Sandra Pritzkow, Nicolas Mendez. *e-mail: Claudio.Soto@uth.tmc.edu 


Nature | Vol578 | 13 February 2020 | 273 


Article 


a wae b + HC (n = 42) c eee d + HC (n= 3) 
+ MSA (n = 30) RT + MSA (n = 3) 
4,000)~ PD (n= 47) 7,000 3 7,000). PD (n = 3) 
a = 6,000 = 6,000 
=) 2 =) 
=< 3,000 < 5,000 <£ 5,000 
8 8 4,000 8 4,000 
6 2,000 5, 5 
3 3 3,000 3 3,000 
8 1,000 9 2,000 9 2,000 
i it 1,000 iL 4,000 
0 0 = 6 
HC MSA PD 0 100 200 300 HC MSA PD 0 100 200 300 
(n = 56) (n = 75) (n = 94) Time (h) Time (h) 
e f 
51,200 Cor. HC-199 Et 51,200 Brain: HC-199 
£1,000 *N = 1,000 
2 800 \ 8 800 
oO 
5 600 ee Wi V\V ys {) © § 600 
400 s i] 2 400 
° 200 L.CO7 (HS-199) \ OCH, 5 200 hile seee 
3 esses 5 tee teeseesce sets 
rs 0 Le 


600 650 700 750 800 
Wavelength (nm) 


+ PD 
— MSA 


CSF: HC-169 


— 
a oOo 
oOo oO 


Fluorescence (AU) (2 
a 
(=) 


2 600 650 700 750 800 
Wavelength (nm) 

Fig. 1| Differential interaction of amyloid-binding dyes with a-syn 

aggregates derived from patients with PD or patients with MSA. 

a,b, Samples of CSF (40 pl) from patients with PD (PD), patients with MSA or 

healthy control individuals (HC) were subjected to a-syn-PMCA and the extent 

of aggregation was monitored by ThT fluorescence. a, Maximum fluorescence 

values (measured at plateau of aggregation) for PD (n=94; red), MSA (n=75; 

blue) and healthy controls (n= 56; black). Each dot represents an individual 

biological sample measured in duplicate and data are mean+s.e.m. 

b, Representative aggregation curves of a-synin the presence of CSF from 

patients with PD (n=47), patients with MSA (n=30) and healthy controls 

(n=42). Data are mean +s.e.m. of all patients analysed in each group. 

c,d, Frozen brain samples from patients with pathologically confirmed PD or 

MSA, or from healthy controls, were homogenized at 10% w/v. A 0.001% 

dilution of brain homogenate was used for the a-syn-PMCA reaction. 


for CSF from patients with Parkinson’s disease (PD) or dementia with 
Lewy bodies’. To further investigate the possibility that PD and MSA 
can be differentiated by a-syn-PMCA, we performed a study using 94 
samples of CSF from patients with PD, 75 from patients with MSA and 
56 from control individuals with other neurological diseases (Methods; 
see Extended Data Table 1 for patient demographics). The maximum 
ThT fluorescence after a-syn-PMCA was significantly greater in sam- 
ples from patients with PD than in samples from patients with MSA 
(Fig. 1a). Products of «-syn-PMCA that were derived from samples from 
patients with MSA had a maximum fluorescence of less than 1,800 
units, whereas for PD this value ranged between 2,000 and 8,000 units. 
Control samples did not show any fluorescence over the background 
levels (Fig. 1a). The kinetics of aggregation for all samples in this study 
are shown in Extended Data Fig. 1. Of the 75 samples from patients with 
MSA, 4 had an aggregation profile that was compatible with the PD 
strain and, conversely, 3 of the 94 samples from patients with PD hada 
profile typical of MSA. From this cohort of samples the overall sensitiv- 
ity for diagnosis of PD and MSA, as compared to controls calculated 
by receiving operating curves, was 93.6% and 84.6%, respectively. In 
both cases, specificity was 100%. Comparing differential diagnosis 
of PD and MSA, we estimated that of the 88 samples from patients 
with clinically diagnosed PD that showed a-syn seeds by a-syn-PMCA, 
85 were correctly identified as PD in our assay (that is, a sensitivity of 
96.6%). Of the 65 samples from patients with MSA that were shown by 
a-syn-PMCA to contain a-syn aggregates, 61 had the typical signature of 
MSA (maximum fluorescence of less than 1,800), indicating a sensitivity 
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c, Maximum fluorescence values for PD (n=3), MSA (n=3) and healthy controls 
(n=3). Each dot represents an individual biological sample measured in 
duplicate and data are mean +s.e.m. of three patients in each group. 
****P < 0.0001 by one-way analysis of variance (ANOVA) followed by Tukey’s 
multiple comparison test (a, c).d, Aggregation profiles of a-syninthe presence 
of samples from the brain of patients with PD (n= 3), patients with MSA (n=3) 
and healthy controls (n=3). Data are mean+s.e.m. of three patients in each 
group. e-h, Differential binding of two amyloid-conformation-specific dyes 
(HS-199 and HS-169) to a-syn aggregates obtained after two rounds of a-syn- 
PMCAin samples from the CSF (e, g; n= 43) or the brain (f, h; n= 3) of different 
patients with PD or MSA. Excitation was at 540 nm and the emission spectrum 
was recorded between 580 and 800 nm. The chemical structures of HS-199 and 
HS-169 are also shown. Each experiment was performed in duplicate and data 
are mean+s.e.m. (for many points the error bars are smaller than the symbols). 


of 93.8%. Combining all samples, we correctly distinguished PD from 
MSA in 146 of the 153 samples analysed—an overall sensitivity of 95.4%. 

The above data were obtained from different cohorts of patients and 
across several separate experiments. To illustrate the typical profile 
of a-syn-PMCA aggregation for samples of PD and MSA, we took the 
largest individual cohort of samples analysed in Fig. 1a and plotted 
data from samples that were identified as PD (n=47) and MSA (n=30) 
(Fig. 1b). The maximum fluorescence and the kinetics of aggregation 
were consistently different for PD and MSA, with samples from patients 
with MSA aggregating faster but reaching a lower fluorescence plateau 
than those from patients with PD (Fig. 1b). To determine whether the 
aggregates present in the CSF are representative of those found inthe 
brain, we also amplified brain samples from three different patients with 
PD or MSA. To reduce the chance of other brain components interfering 
inthe reaction, we started the PMCA assay with a 10“ dilution of brain 
homogenate. Under these conditions, we found that amplified brain- 
derived a-syn aggregates showed the typical signature of PD or MSA, 
bothinterms of the maximum ThT fluorescence (Fig. 1c) and the kinet- 
ics of aggregation (Fig. 1d). These results suggest that the aggregates 
presentin the CSF of patients reflect the aggregates present inthe brain. 

Notably, the qualitative differences in ThT fluorescence were main- 
tained when the a-syn aggregates that were amplified from samples 
of CSF from patients with PD or patients with MSA were replicated 
serially at the expense of monomeric a-syn (Extended Data Fig. 2). For 
these studies, an aliquot of the final product of the first a-syn-PMCA 
reaction (starting from CSF samples) was diluted 100-fold into fresh 


a-syn monomers, and a new a-syn-PMCA assay was performed. This 
was repeated several times, and the product maintained the high-fluo- 
rescence signal for PD and low-fluorescence signal for MSA (Extended 
Data Fig. 2). To further study the properties of the aggregates that were 
amplified from patients with PD or with MSA, we selected samples from 
43 patients with PD and 43 patients with MSA (see Extended Data Table 2 
for the demographic characteristics of these patients). The selection of 
the 43 samples for each disease was done by eliminating samples that 
did not aggregate (false negatives) and including those that had the 
typical signatures of PD or MSA, as indicated above (Fig. lb, Extended 
Data Fig. 1). The majority of the characterization studies were done 
with samples from the second cycle of amplification; this was neces- 
sary to generate sufficient material and also to reduce any interference 
fromthe CSF, which is important for some of the techniques used (for 
example, circular dichroism and Fourier-transform infrared (FTIR) 
spectroscopy). 

First, we wanted to verify that the differences in ThT fluorescence 
did not simply reflect different amounts of aggregates at the end of 
the reaction. To investigate this further, we performed sedimentation 
assays to separate the pools of soluble and aggregated a-syn. We meas- 
ured the amount of protein pelleting after centrifugation at 20,000g¢ 
for 30 minutes, using silver staining after SDS-PAGE (Extended Data 
Fig. 3a) and dot blot analysis (Extended Data Fig. 3b). We also measured 
the amount of protein remaining in the supernatant, using the bicin- 
choninic acid assay (Extended Data Fig. 3c). The results clearly showed 
that the amount of aggregates produced at the end of the a-syn-PMCA 
assay was the same in both the PD and the MSA samples. Our interpre- 
tation of these results is that either the accessibility or the mode of 
interaction of ThT with aggregates differs between aggregates derived 
from patients with PD and those derived from patients with MSA, and 
that this probably reflects structural differences in the aggregates. 

To study the differences between aggregates associated with PD and 
aggregates associated with MSA in more detail, we first used a panel of 
thiophene-based ligands that have previously been shown to interact 
with amyloid aggregates and produce a different spectrum depending 
onthestructural characteristics of the aggregates’*””. The conjugated 
thiophene backboneis flexible and thus the binding and fluorescence 
emission of the molecules depends on the conformational properties 
of the aggregates, providing a specific spectral fingerprint of different 
aggregates'*””, These compounds have previously been shown to dis- 
criminate between different conformational strains of prions, amyloid 
Band tau proteins”°”!, We analysed a set of seven different thiophene- 
based ligands and found that some of them showed substantially dif- 
ferent capacities to interact with a-syn aggregates derived from PD 
samples compared to those derived from MSA samples (Fig. le-h). 
HS-199 showed a very specific binding affinity and high emission of fluo- 
rescence for PD aggregates, whereas the fluorescence of this dye inthe 
presence of MSA aggregates was very low (Fig. le). Similar results were 
obtained when analysing samples derived from brain extracts (Fig. If), 
further supporting the conclusion that aggregates amplified fromthe 
CSF and the brain are equivalent. Conversely, the HS-169 dye appeared 
to bind preferentially to MSA aggregates over PD aggregates, againin 
samples amplified from both the CSF (Fig. 1g) and the brain (Fig. 1h). 

To analyse the biochemical differences between a-syn aggregates 
derived from patients with PD and from patients with MSA, we examined 
their resistance to proteolytic degradation and performed epitope- 
mapping experiments. Limited protease digestion is commonly used 
to distinguish prion strains”. Aggregates of a-syn derived by seeding 
and amplification from the CSF of patients with PD or patients with 
MSA differed in their extent of protease resistance and in the size of 
the core fragment that was resistant to degradation, as analysed by a 
panel of different antibodies (Fig. 2a—c; see Extended Data Fig. 4 for the 
study done witha larger number of samples). Aggregates of a-syn that 
were amplified from the CSF of patients with PD or patients with MSA 
were very resistant to degradation, even after treatment with a high 
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Fig. 2| Protease resistance and epitope mapping of a-syn aggregates 
derived from the CSF or the brain of patients with PD or patients with MSA. 
a-c, a-Syn-PMCA products starting from samples of CSF from patients with 
MSA or patients with PD were incubated without (—) or inthe presence of 
increasing concentrations of proteinase K (PK; 0.001, 0.01, 0.1and1mg mI°) at 
37 °C for 1h. Samples were subjected to western blotting using three different 
antibodies against a-syn: N-19 (Santa Cruz), which recognizes the N-terminal 
region (residues 1-50) of a-syn (a); anti-a-syn clone 42 (BD Biosciences), which 
is raised against the middle region of a-syn (residues 15-123) (b); and 211 (Santa 
Cruz), whichis reactive against the C-terminal region of a-syn (residues 121- 
125) (c). Similar results were obtained for three other patients analysed per 
disease (Extended Data Fig. 4). d, Profiles of digested fragments from five 
patients in each group, developed with the BD clone 42 anti-a-syn antibody. 
The results for all of the PD (n = 43) and MSA (n= 43) samples analysed are 
shown in Extended Data Fig. 5. For the experiments in a—d, we used the 
aggregates fromthe second round of amplification. e, Profile of proteinase- 
K-resistant fragments after serial rounds of a-syn-PMCA. The first round 
corresponds to direct amplification from the CSF. For the second round of 
amplification, aggregates produced in the first round were diluted 100-fold 
into fresh a-syn monomer substrate and anew round of a-syn-PMCA was 
performed. The assay was then repeated for the third and fourth rounds using 
amplified a-syn aggregates (1%) fromthe previous round. As before, amplified 
aggregates were treated with proteinase K (1 mg mI“) and blots were developed 
with the BD clone 42 anti-a-syn antibody. f, Proteinase K resistance profiles of 
aggregates amplified from the brain of patients with neuropathologically 
confirmed PD (n=3) or MSA (n=3). Molecular weight markers (kDa) are 
indicated on the left of each blot. 


concentration of proteinase K (Img ml”) for 1 hour. Under these condi- 
tions, protease-resistant fragments mostly mapped to the N-terminal 
(Fig. 2a) and middle (Fig. 2b) regions of the protein. Conversely, the 
C-terminal region of a-syn appeared to be fully degraded after incuba- 
tion with more than 0.01 mg mI‘ of proteinase K (Fig. 2c), which suggests 
that this part of the protein may not be implicated in the formation of 
the aggregates (consistent with previous structural studies of a-syn 
fibrils?? >). Notably, the size and number of protease-resistant bands 
that were detectable by antibodies directed to the middle region of 
a-syn (residues 15-123) differed substantially between PD and MSA. Four 
bands with molecular weights ranging from 4 tol0 kDa were detected 
for samples from patients with PD, whereas only two bands (4. and 6 kDa) 
were detected for samples from patients with MSA (Fig. 2b, d). This signa- 
ture was observed across all of the 43 PD and 43 MSA samples that were 
analysed (Fig. 2d shows 5 representative samples per disease; Extended 
Data Fig. 5 shows all 86 samples). The signature was maintained after 
serial replication in vitro by a-syn-PMCA (Fig. 2e, Extended Data Fig. 6), 
albeit with some small variability in the relative proportions of different 
bands between rounds of amplification. This result provides further 
evidence that a-syn-PMCA maintains the biochemical and structural 
properties of a-syn aggregates. We also analysed the pattern of pro- 
teinase K resistance of a-syn aggregates that were amplified from the 
brain of patients with PD or patients with MSA. The profiles of protease- 
resistant fragments from brain exhibited the typical signature of PD or 
MSA (Fig. 4d), again suggesting that the aggregates present in the CSF 
are equivalent to those that accumulate in the brain. 
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Fig.3| Structural differences between a-syn aggregates derived from 
patients with PD or patients with MSA. a, Circular dichroism spectra of a-syn 
aggregates from the CSF of patients with PD (red) or patients with MSA (blue), 
amplified by two rounds of a-syn-PMCA. Spectra were recorded from 35 uM 
suspensions of a-syn aggregates, as described in Methods. Measurements 
were taken for all of the PD (n= 43) and MSA (n= 43) samples analysed and data 
(molar ellipticity) are mean +s.e.m.b, A similar experiment was performed for 
a-syn aggregates that were amplified from the brain of patients with PD (n=3) 
or patients with MSA (n=3).c, FTIR spectra of a-syn aggregates that were 
obtained after two rounds of seeding and amplification of samples of CSF from 
patients with PD (n=10) or patients with MSA (n=10). The solution of 
aggregated proteins (5 pl; 5 mg mI) was analysed with an FTIR-4100 
spectrometer (JASCO). d, Cryo-ET was performed to evaluate structural 
differences between fibrils from patients with PD and fibrils from patients with 
MSA. Central slices of representative subtomograms of PD-associated fibrils 
and MSA-associated fibrils are shown. The negative-stained fibrils were imaged 
with a300-kV electron microscope (Methods). Yellow arrows indicate twists in 
the filaments. Scale bar, 20 nm. e, Three-dimensional density maps segmented 
from the original tomograms. Boxed densities are magnified views. f, Three- 
dimensional helical models were built that overlapped with the corresponding 
densities of PD- and MSA-associated fibrils, including a magnification of the 
central region. g, Helical models showing the periodicity of twisting of PD- or 
MSA-associated fibrils. Black arrows indicate the twist in the 3D model of the 
filament. h, Quantification of the periodic spacing (innm) in many different 
fibrils derived from samples from patients with PD (n= 3) or patients with MSA 
(n=3) samples. Each dot corresponds toa different fibril and data are 

mean +s.e.m.*P< 0.05 by one-way ANOVA followed by Tukey’s multiple 
comparison test. 


Circular dichroism spectroscopy showed that the secondary struc- 
ture of a-syn aggregates in both PD and MSA predominantly comprises 
B-sheets (as illustrated by a negative peak at around 220 nm) (Fig. 3a). 
Analysis of the spectra indicates that MSA aggregates have a higher 
proportion of B-sheet structure than PD aggregates. Analogous results 
were obtained in the three samples from patients with PD and three 
samples from patients with MSA that were amplified from the brain 
rather than the CSF (Fig. 3b). To confirm these results using a different 
methodology, we used FTIR spectroscopy to estimate the secondary 
structures of a-syn aggregates in samples from a group of randomly 
selected patients with PD (n=10) and patients with MSA (n=10) (Fig. 3c). 
The MSA-derived aggregates showed a spectrum dominated by parallel 
B-sheet structure (peak at 1,640 cm), whereas for PD-derived aggre- 
gates there was also another clear peak at around 1,652 cm, which 
could be assigned to either a-helix- or random-coil-type structures 
(Fig. 3c). 

To gain further insight into the structures of both species of a-syn, we 
performed cryo-electron tomography (cryo-ET) studies. Single-particle 
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Fig. 4 | Cytotoxicity of amplified a-syn aggregates from the CSF of patients 
with PD or patients with MSA. a, b, RK13 cells (a) (10,000 cells), or neuronal 
precursor cells derived from human induced pluripotent stem cells generated 
as previously described”* (b) (5,000 cells), were plated ina 96-well plate. After 
24h, cells were treated for 24 h for RK13 cells and 48 h for neuronal precursor 
cells with different concentrations of amplified a-syn fibrils from samples of 
CSF from patients with MSA or patients with PD. Cell viability was determined by 
MTT assay. Experiments were carried out in triplicate, each dot represents an 
individual replicate and data are mean +s.e.m. *P< 0.05, ***P<0.001, 

****P <().0001 by one-way ANOVA followed by Tukey’s multiple comparison test. 


cryo-electron microscopy (cryo-EM) has previously been used to deter- 
mine the high-resolution structure of a-syn aggregates that were gener- 
ated in vitro“. Instead of taking single shots of two-dimensional (2D) 
images for a given area of asample grid (asin single-particle cryo-EM), 
cryo-ET takes multiple shots in the same area by tilting the sampleina 
series of angles. A three-dimensional (3D) tomogram can be directly 
reconstructed from the series of tilts. To increase the contrast of the 
tomographic images, we negatively stained the fibrils amplified from 
the CSF of patients with PD or patients with MSA. We took 17- and 22-tilt 
series for PD and MSA samples, respectively (see ‘Cryo-ET analysis and 
3D reconstructions’ in Methods for details). The tomograms (Fig. 3d, 
e) had enough contrast for us to determine that both fibrils were com- 
posed of two protofilaments that intertwine in a left-handed helix with 
a diameter of around 9 nm (see Extended Data Fig. 7 for more images of 
representative fibrils from three different patients). This is consistent 
with the high-resolution structure obtained by cryo-EM for full-length 
a-syn aggregates that were prepared in vitro**. However, the lengths 
of fibril twists clearly varied between PD and MSA. On the basis of indi- 
vidual measurements of helical diameter and twist lengths, we were able 
to manually build helical models (Fig. 3f, g) guided by the segmented 
fibril densities (Fig. 3e). PMCA-derived a-syn aggregates from patients 
with PD were composed of long stretches of straight filaments with heli- 
cal twists that generally ranged from 76.6 to 199 nm in length (Fig. 3g). 
By contrast, a-syn filaments from patients with MSA had shorter twists 
that mostly ranged from 46 to 105 nm in length (Fig. 3g). In accordance 
with this, measurements of periodic spacing indicated that the average 
twisting distance was significantly different between fibrils associated 
with PD and fibrils associated with MSA (65.2 + 3.8 nm (mean +s.e.m.) 
in MSA fibrils, n = 104 from 3 different patients; 108.5 + 6.1nm in PD 
fibrils, n=104 from 3 different patients) (Fig. 3h). These data indicate 
that the structures of a-syn aggregates derived from patients with 
PD and from patients with MSA are clearly different on the basis of 
their average periodicities of helical twists. Notably, previous studies 
using immuno-electron microscopy showed that non-amplified brain- 
derived a-syn filaments from patients with MSA are predominantly 
twisted”°, whereas those from patients with PD are mostly straight’. 
To explore whether aggregates derived from the CSF of patients 
with PD and patients with MSA have biological differences, we stud- 
ied their toxicity in cell culture. For these experiments, we used a cell 
line that is often used in the prion field to study prion replication and 


toxicity (RK13) (Fig. 4a), together with human neuronal precursor 
cells derived from induced pluripotent stem cells (Fig. 4b). Induced 
pluripotent stem cells and neuronal precursors were generated and 
characterized from fibroblasts obtained from a healthy individual, 
as previously described**. We tested cytotoxicity by incubating cells 
with different concentrations of a-syn aggregates derived from the 
CSF of patients with PD or patients with MSA. MSA-derived aggregates 
showed highly significant toxicity in RK13 cells, even at concentrations 
of 1.25 uM; by contrast, PD-derived aggregates began to show signifi- 
cant toxicity only at 5 uM (Fig. 4a), indicating that MSA aggregates are 
more toxic than PD aggregates. A similar conclusion was obtained in 
the neuronal precursor cells that were derived from human induced 
pluripotent stem cells (Fig. 4b). 

The prion-like behaviour of a-syn aggregates is arecently recognized 
principle that may have a central role in the pathological progression 
of various synucleinopathies*’*°. Indeed, the ability of a-syn aggre- 
gates to propagate their misfolded abnormalities enables the pro- 
gressive spreading of damage from cell to cell? >. One of the tenets of 
the prion principle is that the misfolded protein can exist in different 
self-perpetuating conformational strains, which have the ability to 
faithfully template the misfolding of the normal monomeric protein 
in the abnormal-strain-specific conformation”. Here we have shown 
that the prion principle can be used as an effective strategy to cycli- 
cally amplify the process of protein misfolding and thereby enable the 
detection of small amounts of a-syn aggregates in the CSF. Notably, we 
were able to distinguish—with high sensitivity and specificity—between 
samples from patients with two clinically similar synucleinopathies 
(PD and MSA). Moreover, we have shown that the a-syn aggregates 
present in the CSF of patients are representative of those that accumu- 
late in the brain, indicating that the a-syn-PMCA assay can measure— 
non-invasively—the pathological species that are associated with differ- 
ent synucleinopathies. Our results demonstrate that a-syn aggregates 
exist as distinct conformational strains with different biochemical and 
structural properties, which will help to improve our understanding 
of the pathogenesis of these diseases. Furthermore, our study shows 
that patients with distinct synucleinopathies can be distinguished on 
the basis of the a-syn strain that is present in their CSF. These data may 
enable the development of a biochemical test for the specific diagnosis 
of different disorders that involve the misfolding of a-syn, with poten- 
tial future applications in clinical trials and personalized medicine. 
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Methods 


Data reporting 

No statistical methods were used to predetermine sample size, and 
the experiments were not randomized and the investigators were not 
blinded to allocation during experiments and outcome assessment. 


Patient samples 

CSF samples were obtained from 94 patients who were clinically diag- 
nosed with PD, 75 patients who were diagnosed with MSA and 56 control 
individuals (people with other neurological diseases: epilepsy, cervi- 
cal spondylosis, polyneuropathy, muscular dystrophy, viral myositis, 
myelopathy and hydrocephalus). Extended Data Table 1 displays a 
summary of the demographic characteristics of these patients. Most 
samples were collected at the Mayo Clinic, as indicated below. The clini- 
cal diagnoses of probable PD and MSA were made according to interna- 
tionally standardized criteria, including the UK Brain Bank guidelines”. 
CSF samples were collected in the morning using polypropylene tubes 
following lumbar puncture at the L4/L5 or L3/L4 interspace with atrau- 
matic needles after overnight fasting. The samples were centrifuged 
at 3,000g for 10 min at room temperature, aliquoted and stored at 
-80 °C until analysis. Blood cell (red and white) counts and glucose, 
protein and haemoglobin concentrations were determined as previ- 
ously described’. The methods of CSF collection were approved by the 
institutional review boards at the study centres (Mayo Clinic and the 
University of Texas Health Science Center at Houston), and all study 
participants provided written informed consent. 

Brain tissue from patients with PD and patients with MSA was 
obtained from the Banner Sun Health Research Institute. Control brain 
tissue was supplied by NDRI (National Human Tissue Resource Center). 
Frozen samples of frontal cortex were homogenized using a tissue 
grinder in 10% w/v ice-cold PBS (HyClone, SH30256.01) with complete 
protease inhibitor cocktail (Roche). The experiments with human tis- 
sue were performed following the universal precautions for working 
with human specimens and as directed by the Institutional Review 
Board of The University of Texas Health Science Center at Houston 
(HSC-MS-14-0608). 


Expression and preparation of monomeric a-syn 

The purification and characterization of monomeric a-syn was doneas 
previously described’. In brief, the pET-21b plasmid carrying the coding 
DNA sequence for human a-syn containing a His-tag at the C terminus” 
was overexpressed in BL21(DE3) pLysS (Invitrogen) Escherichia coli 
cells at 25 °C using 0.1 mM IPTG (isopropyl §-D-thiogalactoside) for 
6h. The bacterial pellets were lysed in 50 mM NaH,PO, (pH 8.0), 300 
mM NaCl, 10 mM imidazole, 1 mM PMSF, 0.1mM tris-(2-carboxyethyl) 
phosphine (TCEP) and1mg mI lysozyme, followed by sonication onice. 
The lysate was then centrifuged at 12,000g for 15 min at 4 °C, followed 
by ultracentrifugation at 100,000g for 30 min at 4 °C. The supernatant 
was filtered through a 0.45-um filter and loaded onto a nickel-affinity 
column (Nickel Sepharose Fast flow, GE Healthcare). Proteins were 
eluted using 250 mM imidazole and a-syn-containing fractions were 
dialysed overnight at 4 °C against PBS, pH 7.4. To remove any preformed 
seeds or aggregates, the protein solution was filtered througha100-kDa 
cut-off filter (Amicon Ultra, Millipore), separated into small aliquots 
and stored at -80 °C until use. Protein concentration was determined 
by bicinchoninic acid (BCA) assay (Pierce). The purity of the protein 
was evaluated by silver staining. 


a-syn-PMCA 

The a-syn-PMCA (also known as a-syn-RT-QulIC) assay was performed 
as previously described’. In brief, samples of seed-free, monomeric 
a-syn at a concentration of 1 mg ml‘ in 100 mM PIPES, pH 6.5 and 500 
mM NaCl were placed in opaque 96-well plates (Costar, REF 3916) inthe 
presence of 5M ThT ata final volume of 200 pl. For each test, we added 


40 pl of CSF from patients and controls or 40 pl of brain homogenate 
(at a final concentration of 0.001%). Positive controls consisted of a 
well-documented and previously screened healthy CSF sample spiked 
with preformed a-syn oligomeric seeds. Samples were subjected to 
cyclic agitation (1 min at 500 rpm followed by 29 min without shaking) 
at 37 °C. The increase in ThT fluorescence was monitored at an excita- 
tion of 435 nm and emission of 485 nm, periodically, using a microplate 
spectrofluorometer Gemini-EM (Molecular Devices). 

For serial rounds of amplification, an aliquot from the amplified 
material was diluted 100-fold into fresh a-syn monomer substrate 
and anew a-syn-PMCAassay was performed. This was repeated three 
consecutive times to obtain aggregates corresponding to the second, 
third and fourth rounds of amplification. The first round of amplifica- 
tion corresponds to the one initiated with the biological samples (CSF 
or brain homogenate). 


Measurement of protein concentration in the aggregated 
product after amplification 

Samples at the end of the PMCA reaction were centrifuged at 20,000g 
for 30 min at 4 °C. The resultant supernatants were carefully separated 
fromthe pellets. The amount of aggregated product was measured in 
allsamples by three different procedures: (1) protein quantity in pellets 
was measured by silver staining after SDS-PAGE; (2) dot blot analysis 
of sedimented materials; and (3) BCA measurements of total protein 
content inthe supernatant fraction. For SDS-PAGE, pellets were resus- 
pended in PBS and separated on a 12% Bis-Tris gel and protein bands 
were visualized by silver staining as per the manufacturer’s protocol. 
For dot blot analysis, 2 pl of resuspended pellets was spotted onto 
nitrocellulose membranes (Amersham Biosciences) and air-dried for 
30 min at room temperature. Blots were blocked with 5% w/v non-fat 
dry milk in Tris-buffered saline-Tween 20 (TBS-T) (20 mM Tris, pH 7.2, 
150 mM NaCl and 0.05% (v/v) Tween 20) at room temperature for 2 h. 
After blocking, the membranes were probed with anti-a-syn antibody 
(BD Bioscience; 1:2,000) and anti-rabbit horseradish peroxidase (HRP)- 
conjugated secondary antibodies (1:5,000). The blots were visualized 
using enhanced chemiluminescence and a western blotting detection 
kit (Amersham Biosciences). Finally, the protein concentration in super- 
natants was determined using a BCA assay kit as per the manufacturer’s 
recommendations. 


Interaction of a-syn aggregates with thiophene-based ligands 
Aset of seven thiophene-based ligands (p-FTAA, h-FTAA, HS-68, HS-167, 
HS-169, HS-194 and HS-199) that have previously been shown to dis- 
criminate between different conformational strains composed of 
various proteins’’”” was used in this study. These compounds were 
synthetized and characterized as previously described”? **, or as 
outlined below and in Extended Data Fig. 8 for compound HS-199. 
The stock solution for each compound was prepared in deionized 
water or DMSO at 1.5 mM. For our experiments, we diluted these stocks 
to reach a final concentration of 150 pM. The excitation and emis- 
sion wavelength range was different depending on the molecule, as 
previously described'*”. 


Synthesis and characterization of HS-199 

A mixture of methyl 5’-bromo-[2,2’-bithiophene]-5-carboxylate (140 
mg, 0.462 mM), (5-formylthiophen-2-yl)boronic acid (80 mg, 0.508 
mM) (Extended Data Fig. 8), K,CO, (192 mg, 1.39 mmol) in1,4-dioxane/ 
methanol (8: 2, 8 ml, degassed) and PEPPS-IPr (2 mol%) was heated 
to 80 °C for 30 min. After cooling to room temperature, the pH was 
adjusted to 4 by addition of 1M HCl and the residue was extracted with 
DCM (3 x 20 ml) and washed with water (3 x 20 ml) and brine (30 ml). 
The combined organic phase was dried over MgSO, and the solvent was 
evaporated. The residue was subjected to column chromatography 
using CH,Cl, followed by crystallization from DMF to give a trimer 
(Extended Data Fig. 8) as a yellow solid (115 mg, 74%). 


A few drops of pyridine were added to a cold solution of this trimer 
(0.05 g, 0.150 mM) and the corresponding 2-methyl-3-alkylbenzothi- 
azolium salt (Extended Data Fig. 8) (46 mg, 150 mM) in an anhydrous 
mixture of MeOH and THF (8:2). The mixture was refluxed until comple- 
tion of the reaction (monitored by TLC, eluent: DCM/MeOH 1%). The 
solvent was evaporated in vacuo to provide a dark red solid, which was 
crystallized from MeOH. The red crystals were collected by filtration, 
washed with cold MeOH and driedin vacuum to afford HS-199 as a dark 
red solid (53 mg, 57%). Extended Data Figure 8 provides asummary of 
this reaction scheme. 

The compound was characterized by infrared (IR) spectroscopy, 
nuclear magnetic resonance and mass spectrometry. IR (neat) 1,697, 
7,594, 1,582, 1,525, 1,446, 1,421, 1,304, 1,245, 1,210, 1,165, 1,098, 1,054, 
1,035, 926, 806, 786, 758 and 744 cm '.'H NMR (300 MHz, DMSO-d6) 68 
8.48-841 (m, 2H), 8.27 (d,/=8.6 Hz, 1H), 7.97 (d,J=3.9 Hz, 1H), 7.91-7.58 
(m, 7H), 7.50 (d,J = 3.9 Hz, 1H), 4.92 (q,/ = 7.0 Hz, 2H), 3.85 (s, 3H), 1.47 
(t,J = 7.0 Hz, 3H).°C NMR (75 MHz, DMSO-d6) 6 170.53, 161.43, 143.01, 
142.17, 140.89, 140.83, 138.37, 137.10, 136.14, 136.01, 134.89, 131.30, 
129.48, 28.26, 128.16, 127.79, 127.69, 126.66, 125.58, 124.38, 116.44, 111.21, 
52.41, 44.29, 14.14. Matrix-assisted laser resorption ionization-time 
of flight (MALDI-TOF): m/z calculated for C,;H,)NO,S, (M+H)*: 495.0. 
Found: 495.0. 


Protease digestion and epitope mapping 

Samples containing a-syn aggregates amplified by PMCA were treated 
with different concentrations of proteinase K at 37 °C for 1h. The reac- 
tion was stopped by heating the sample in NUPAGELDS buffer at 95 °C 
for 10 min. The digested products were resolved by 12% Bis-Tris gels (Inv- 
itrogen). Proteins were electrophoretically transferred to nitrocellulose 
membranes (Amersham Biosciences). Membranes were blocked with 
5% w/v non-fat dry milk in PBS-Tween 20 (PBS (Hyclone SH.30258.02, 
pH 7.2, 0.1% (v/v) Tween 20) at room temperature for 1h. After block- 
ing, the membranes were probed with the following antibodies against 
a-syn: N-19 (Santa Cruz), which recognizes the N terminus (residues 
1-50) of a-syn; anti-a-syn clone 42 (BD Biosciences), which is raised 
against the middle region (residues 15-123) of the protein; and 211(Santa 
Cruz),whichis reactive against the C-terminal region (residues 121-125) 
of a-syn. The blots were developed using ECL prime detection western 
blotting reagents (Amersham Biosciences). 


Circular dichroism 

Solutions containing around 35 uM of a-syn aggregates amplified by 
a-syn-PMCA were used for these studies. Circular dichroism spectra 
were recorded at room temperature using a JASCO J815 spectropola- 
rimeter, with 1-mm path-length cuvette. Circular dichroism data were 
collected at 0.1-nm resolution and at ascan speed of 200 nm min“. The 
portion of the circular dichroism spectrum between 250 and 350 nm 
was fitted with a quadratic function and the baseline of the whole spec- 
trum was calculated using the function. Then the calculated baseline 
was subtracted from the circular dichroism spectrum to obtain the 
baseline-corrected circular dichroism spectrum. 


FTIR spectroscopy 

FTIR experiments were conducted using an FT/IR-4100 spectrometer 
from JASCO. The product of a-syn-PMCA (5 pl) was placed onthe top of 
a diamond PRO450-S attenuated total reflectance unit (JASCO) adapted 
tothe FT/IR-4100 system. The system parameters included a resolution 
of 4.0 cm‘ and an accumulation of 80 scans per sample. The data were 
processed using cosine apodization and Mertz phase correction. The 
data were also corrected for attenuated total reflectance and carbon 
dioxide vapour absorption. 


Cryo-ET analysis and 3D reconstructions 
The product of a-syn-PMCA (after 2 rounds of amplification from CSF 
samples from patients with PD or patients with MSA) was sedimented 


at 20,000g for 30 min at 4 °C, resuspended in 100 mM PIPES, pH 6.5 
and 500 mM NaCl, diluted 10-fold in deionized water and loaded onto 
Formvar/Carbon Copper grids. Samples were negatively stained with 
2% uranyl acetate and rapidly frozen in liquid ethane, using a gravity- 
driven plunger apparatus. Materials were imaged at -170 °C using a 
Polara G2 electron microscope (FEI) equipped witha field-emission gun 
and adirect-detection device (Gatan K2 Summit). The microscope was 
operated at 300 kV witha magnification of x15,500. We used SerialEM*” 
to collect tomographic tilt series at a defocus of around 6 pm, with 
cumulative doses of around 200 e’ per A?. For each dataset, 35 image 
stacks were collected in a range from —51° to +51°, using increments 
of 3°. Each stack contained about 10 images, which were first aligned 
using MotionCor2**. The tomograms were reconstructed using IMOD 
software” and were further processed by EMAN software*®. 

The helical models were manually built based on individual fibril 
density. The twist lengths of fibrils vary from one to another. We were 
not able to perform subtomogram averaging owing to the heterogene- 
ous nature of the fibrils. The single fibril density used for modelling 
has very limited resolution. Its density in the z-direction is elongated 
because of a missing wedge issue (no high-angle tilt images). We rely 
on the helical parameters (diameter and the twist lengths) to build a 
helical model, which can be directly measured from the centre slice 
of the tomogram in the x-y plane. Instead of creating a mathematical 
helical model on the basis of the two parameters, we manually built the 
helical model by tracing the filament density using Chimera software”. 
Model dots were placed along the densities followed by manual adjust- 
ment of the dot positions to make the model shape helix-like under 
the restriction of cryo-ET density. Although density restriction was 
applied, the constructed pseudo-helical model does not completely 
fit into the noisy and distorted fibril density. 


Cytotoxicity assays 

RK13 cells (rabbit kidney cell line, ATCC CCL-37) were grown in DMEM 
medium supplemented with 10% FBS 1x GLUTAmax, 1x MEM and 1 
mM sodium pyruvate. For toxicity, 10,000 cells were plated without 
antibiotic in a 96-well plate and incubated at 37 °C for 24 _h. Neuronal 
precursors derived from human induced pluripotent stem cells were 
generated and characterized as previously described”®. These cells 
were maintained in neural precursor expansion medium (NPEM) as 
previously described. Approximately 5,000 cells were plated per well 
ina 96-well plate, pre-coated with Geltrex LDEV-free reduced growth 
factor basement membrane matrix-treated dishes (1:100, Invitro- 
gen) and incubated at 37 °C for 24 h. After 24 h, cells were treated 
for either 24 h (RK13 cells) or 48 h (neuronal precursors) with differ- 
ent concentration(s) of amplified a-syn fibrils originating from CSF 
samples from patients with MSA and patients with PD. Cell viability 
was determined by the MTT assay, following the manufacturer’s 
protocol. 


Reporting summary 
Further information on research design is available in the Nature 
Research Reporting Summary linked to this paper. 


Data availability 


All data generated and/or analysed during this study are included in 
the Article, Supplementary Fig. 1 (uncropped blots) and the Source 
Data files for Figs. 1, 3, 4 and Extended Data Figs. 1-3. Any additional 
information required are available from the corresponding author on 
reasonable request. 
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Extended Data Fig. 1| Kinetics of a-syn aggregation in the presence of CSF 
from patients with PD, patients with MSA or healthy control individuals. 
a-c, Individual a-syn aggregation curves are shown in the presence of CSF 
samples (40 pl) from all study participants, including healthy controls 

(a; n= 56), patients with MSA (b; n=75) and patients with PD (c;n=94). The 
a-syn-PMCA assay was started by adding a-syn monomers (Img mI“) and ThT 
(5 1M) to100 mM PIPES, pH 6.5 containing 500 mM NaCl. The plate was 
incubated at 37 °C with intermittent shaking for 1 min every 30 min at SOOrpm. 
The extent of aggregation was monitored using a fluorometer to measure ThT 
fluorescence, with an excitation of 435 nm and emission of 485 nm. The colours 
represent the expected aggregation curves for patients with PD (red), patients 
with MSA (blue) and healthy controls (black), regardless of clinical diagnosis. 
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Extended Data Fig. 2 | Serial propagation of a-syn aggregates derived from 
patients with MSA and patients with PD. For serial propagation of a-syn 
aggregates, an aliquot of the final product of the first a-syn-PMCA reaction 
(starting from CSF samples) was diluted 100-fold into a solution containing 
fresh a-syn monomers (1mg ml”). Asecond round of amplification was done in 
the same buffer (100 mM PIPES, pH 6.5 containing 500 mM NaCl) at 37 °C with 
intermittent shaking for 1 min every 30 min at 500 rpm. The extent of 
aggregation was monitored by the increase in ThT fluorescence. The maximum 
fluorescence value at the plateau of aggregation was recorded and plotted in 
the graphas the second round of amplification (R2). Similarly, the third and 
fourth rounds of amplification (R3 and R4) were performed by diluting the 
product 100-fold onamplification each time into fresh a-syn monomer 
substrate and repeating the a-syn-PMCA assay. The results shown are from one 
patient with PD and one patient with MSA. The experiment was carried out in 
duplicate, each dot represents an individual technical replicate and dataare 
mean+s.e.m. 
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Extended Data Fig. 3 | Analyses of the quantity of a-syn aggregates after 
amplification from patients with MSA and patients with PD by 
sedimentation assay. Aggregates of a-syn that were obtained after two rounds 
of a-syn-PMCA amplification (starting from CSF samples from patients with 
MSA (n=43) and patients with PD (n = 43)) were centrifuged at 20,000g for 

30 min. a, The resultant pellets were separated on a12% Bis-Tris gel, and protein 
bands were visualized by silver staining as per the manufacturer’s protocol. 
Molecular weight markers (kDa) are indicated on the left of the gel. 

b, Resuspended pellets (2 1) were spotted onto nitrocellulose membranes and 
air-dried for 30 min at room temperature. After blocking with 5% w/v non-fat 
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dry milk at room temperature for 2h, membranes were probed with an anti-a- 
synantibody (BD Bioscience, 1:2,000) and anti-rabbit HRP-conjugated 
secondary antibodies (1:5,000). The blots were visualized using enhanced 
chemiluminescence anda western blotting detection kit. The dot blot shows 
each of the 86 samples (n= 43, PD; n=43, MSA) anda positive control using non- 
aggregated a-syn monomer (dotted box). The results are representative of two 
independent experiments with similar results. c, Protein concentration inthe 
supernatants was determined by a BCA assay kit as per the manufacturer’s 
instructions. Each dot represents an individual sample (n= 43, PD; n=43, MSA) 
in each disease group and data are mean +s.e.m. 
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Extended Data Fig. 4 | Proteinase K digestion profiles of a-synaggregates 
derived from samples of CSF from patients with PD and patients with MSA. 
This is the same experiment as Fig. 2a—c, showing proteinase K digestion 
profiles of other representative samples from patients with PD (n=3) and 
patients with MSA (n=3). The amplified product from the second round of 
a-syn-PMCA in samples of CSF from patients with MSA or patients with PD was 


incubated either without (-) or inthe presence of increasing concentrations of 
proteinase K (0.001, 0.01, 0.1and 1mg mI) at 37 °C for 1h. Proteins were 
separated ona12% Bis-Tris gel and immunoblotted with the same antibodies as 
in Fig. 2 (SC N-19 (top), BD anti-a-syn clone 42 (middle) and SC 211 (bottom)). 
Each blot represents an individual sample. Molecular weight markers (kDa) are 
indicated on the left of the blot. 
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Extended Data Fig. 5 | Proteinase K digestion profiles of a-synaggregates PMCAassay were treated with proteinase K (1mg mI) at 37 °C for 1h. Proteins 
derived from samples of CSF from all 43 patients with PD and 43 patients were Separated ona 12% Bis-Tris gel and immunoblotted with the BD anti-a-syn 
with MSA. This is the same experiment as Fig. 2d, showing proteinase K clone 42 antibody. Molecular weight markers (kDa) are indicated on the left of 
digestion profiles of all 86 (n= 43, PD; n= 43, MSA) biologically independent the blot. The third blot onthe top rowis the same as that shown in Fig. 2d. 
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Extended Data Fig. 6 | Proteinase K digestion profiles of a-synaggregates performed. The assay was then repeated for the third and fourth rounds using 
after several rounds of a-syn-PMCA. This is the same experiment as Fig. 2e, amplified a-syn aggregates (1%) from the previous round. Amplified 
showing the results obtained with samples from different patients with PD aggregates were treated with proteinase K (1 mg mI“) for 1h and proteins were 
(n=3) and patients with MSA (n=3). The first round corresponds to direct separated ona12% Bis-Tris gel and immunoblotted with the BD anti-a-syn clone 
amplification from the CSF of the patients. For the second round of 42 antibody. Molecular weight markers (kDa) are indicated on the left of the 
amplification, aggregates produced in the first round were diluted 100-fold blot. 
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Extended Data Fig. 7 | Electron microscopy images of PD-associated fibrils and patients with MSA (n=3). The negative-stained fibrils were imaged witha 
and MSA-associated fibrils. Representative images of fibrils produced after 300 kV electron microscope. Scale bar, 10 nm (applies to all of the images). 
two rounds of a-syn-PMCA in samples from different patients with PD (n=3) 
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Extended Data Fig. 8 | Reaction scheme for the chemical synthesis of HS-199. acid, K,CO; (1.39 mmol) in1,4-dioxane/methanol (8:2, 8 mL/mM, degassed) 
HS-199 was synthesized by mixing 0.462 mM methyl 5’-bromo-[2,2’- and PEPPS-IPr (2 mol %). 
bithiophene]-5-carboxylate with 0.508 mM (5-formylthiophen-2-yl)boronic 


Extended Data Table 1| Number of samples and basic demographic information for all study participants 


Healthy Controls Multiple System Parkinson’s Disease 
(HC) Atrophy (MSA) (PD) 
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Age and disease duration are given in years. 
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Extended Data Table 2 | Demographic information for the 43 patients with PD and 43 patients with MSA whose CSF samples 
were used to characterize amplified a-syn aggregates in detail 
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Age and disease duration are given in years. MSA-C, MSA with cerebellar ataxia; MSA-P, MSA with Parkinsonism. 
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Data collection SerialEM was used for collecting cryo-ET data 
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Life sciences study design 


All studies must disclose on these points even when the disclosure is negative. 


Sample size We used the maximum number of samples that we were able to obtain, which was much higher than needed according power analysis 


Data exclusions No data was excluded from the study 


Replication Results were replicated in independent experiments as described in the text. Also, every experiment included several replicates as describe in 
the figure legends. All experiments to replicate the results were successful. 


Randomization All samples were used without any randomization, but based on the maximum number of samples available. The demographic information is 
provided in Tables. Results obtained are not different by gender, race or any variable other than clinical diagnosis. 


Blinding Except for experiments to optimize the methodologies, experimenter was always blinded to the sample origin. 
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Authentication Cell lines have been validated by the vendor as stated in the attached document. 
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Commonly misidentified lines 
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Population characteristics CSFand brainsamples from individuals with clinical diagnosis of Parkinson's disease, multiple system atrophy and controls were used. 
The demographic characteristics of these patients, including diagnosis, is included in tables as extended data. 


Recruitment Patients were recruited at Mayo Clinic purely based on clinical diagnosis. No especial requirement for recruitment that could bias 
the sample population. 
Ethics oversight Human subject protocol was reviewed and approved at the University of Texas Health Science Center at Houston and Mayo 
Clinic 


Note that full information on the approval of the study protocol must also be provided in the manuscript. 
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Antibodies 


1) Polyclonal goat anti-a/B-Synuclein (N-19), Cat. No. sc-7012, Santa Cruz, Dilution 1:2000, Lot. No. 
K1213. 


2) Monoclonal mouse anti-a-Synuclein (211), Cat. No. sc-12767, Santa Cruz, Dilution 1:4000, Lot. No. 
B1518. 


3) Monoclonal mouse anti-a-Synuclein (clone 42), Cat. No. 610787, BD Biosciences, Dot Blot Dilution 
1:2000, Western Blot Dilution 1:5000, Lot. No. 7243572 


4) Sheep anti-mouse IgG conjugated to horseradish peroxidase (HRP), Cat. No. A5906, Sigma Aldrich, 
Dot Blot Dilution 1:5000, Western Blot Dilution 1:10000, Lot. No. SLBT9505 


5) Donkey anti-goat IgG conjugated to horseradish peroxidase (HRP), Cat. No. A15999, Invitrogen, 
Dilution 1:10000 


Validation 


All antibodies were validated according to the manufacturer’s instruction and as described in the 
literature 


1) Commercial antibody, validation data available on manufacturer’s website. Ref: Sharma N, et al. 2001 
Acta Neuropathologica 102(4):329-34. 


2) Commercial antibody, validation data available on manufacturer’s website. Ref: Zunke F, Moise AC, et 
al. 2018 Neuron 97(1):92-107. 


3) Commercial antibody, validation data available on manufacturer's website. Ref: Liu Y, Fallon L, et al. 
2002 Cell 111(2):209-18. 


4) Commercial antibody, validation data available on manufacturer’s website. Ref: Trowitzsch S, Viola C, 
et al. 2015 Nature Communications 6: 6011. 


5) Validated by Manufacturer, see Certificate of Analysis on website 


Bacteria 

BL21 Star™ (DE3)pLysS One Shot™ Chemically Competent EF. coli 
Cat. No. 44-0054 

Lot. No. 1437577 


Invitrogen 


Eukaryotic cell lines 


Cell line source: RK13 (ATCC® CCL37™) cell lines were purchased from ATCC. 


Authentication: ATCC provided certificate of analysis for RK13 (ATCC® CCL37™) cell line. 


Cell line source: Neural precursor cells were generated from human iPSC cells reprogrammed from skin 
fibroblasts obtained from a 66 year old female (cell number: AGO8517) purchased from (Coriell, 
Candem, NJ, USA). The procedure for reprogramming, differentiation and growing of cells is described in 
detail in our previous publication (Armijo, E. et al. Neurosci. Lett 639, 74-81, 2017). 


Authentication: Cells were fully characterized by various procedures as described in detail in our 
previous publication (Armijo, E. et al. Neurosci. Lett 639, 74-81, 2017). 


Mycoplasma contamination: All cell lines were tested routinely negative for Mycoplasma contamination 
by PCR. 


Commonly misidentified lines (See ICLAC register): No commonly misidentified lines were used. 
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The biology of haematopoietic stem cells (HSCs) has predominantly been studied 
under transplantation conditions’”. It has been particularly challenging to study 
dynamic HSC behaviour, given that the visualization of HSCs in the native niche in live 
animals has not, to our knowledge, been achieved. Here we describe a dual genetic 
strategy in mice that restricts reporter labelling to a subset of the most quiescent 
long-term HSCs (LT-HSCs) and that is compatible with current intravital imaging 
approaches in the calvarial bone marrow? ©. We show that this subset of LT-HSCs 
resides close to both sinusoidal blood vessels and the endosteal surface. By contrast, 
multipotent progenitor cells (MPPs) show greater variation in distance from the 
endosteum and are more likely to be associated with transition zone vessels. LT-HSCs 
are not found in bone marrow niches with the deepest hypoxia and instead are found 
in hypoxic environments similar to those of MPPs. In vivo time-lapse imaging revealed 


that LT-HSCs at steady-state show limited motility. Activated LT-HSCs show 
heterogeneous responses, with some cells becoming highly motile and a fraction of 
HSCs expanding clonally within spatially restricted domains. These domains have 
defined characteristics, as HSC expansion is found almost exclusively in a subset of 
bone marrow cavities with bone-remodelling activity. By contrast, cavities with low 
bone-resorbing activity do not harbour expanding HSCs. These findings point to 
previously unknown heterogeneity within the bone marrow microenvironment, 
imposed by the stages of bone turnover. Our approach enables the direct visualization 
of HSC behaviours and dissection of heterogeneity in HSC niches. 


At present, tracking of HSCs in live animals requires transplantation 
of the HSCs to be imaged, typically in the calvarium of an irradiated 
recipient whose bone marrow microenvironment has been severely 
altered**. Therefore, although engraftment biology canbe studied in 
these models, the behaviour of stem cells and progenitors is likely to 
differ from that seen in the unperturbed state’**. The recent descrip- 
tion of HSC-reporter lines in mice has facilitated the identification of 
these cells in bone sections and after tissue clearing; nevertheless, 
these reporters are still not fully HSC-specific and require the use of 
additional markers’ °. Despite these advances, there is still considerable 
uncertainty about the exact localization of HSC and progenitor cells. 
Even less is known about the nature of distinct niches that support HSC 
proliferation or maintain HSC quiescence’. 


Development of an HSC-specific reporter line 


The expression of the myelodysplastic syndrome 1 (Mds1) gene is 
highly enriched in LT-HSCs”. Mds/ is transcribed from its own pro- 
moter in the Mecom locus, which also produces the well-known EVI1 
gene product and the MDS1-EVII gene fusion product”. We targeted 
an EGFP expression cassette to the first transcriptional start site 
of Mds1 (Extended Data Fig. 1a). The resulting allele is predicted to 
be ahypomorph for MDS1 and MDSI1-EVI1 but to have no effect on 
the expression of EVI1. Mice heterozygous for the GFP-linked allele 
(Mds1“"") showed normal haematopoietic parameters, frequency 
of HSCs and cell cycle properties, and response to myelosuppression 
(Extended Data Fig. lb-f). Flow cytometric characterization of these 
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Fig. 1| Generation and characterization of Mds1“""Flt3“ (MFG) mice. 

a, b, Flowcytometric analysis of Mds1"* only mice (n=10) and Mds1°""Flt3“ 
mice (n=13); mean +s.d.c, Cell cycle analysis of GFP’ cells from MFG 

mice versus HSCs isolated by LIN SCA-1°C-KIT'CD150*CD48° (SLAM) 
immunophenotype. Representative analysis shown, depicting data from 
multiple mice (MFG, n=7; SLAM, n=2) that were pooled together to acquire the 
displayed data. d, SPRING plot layout of transcriptomes of 50 single MFG* HSCs 


mice confirmed the complete absence of GFP expression in any mature 
lineage-positive (LIN*) haematopoietic cells (Extended Data Fig. 2a, 
Supplementary File 2). GFP expression was predominantly restricted to 
asmall fraction of CKIT'SCAI' cells (Fig. 1a). Using standard phenotypi- 
cal parameters”””, we found that 28.83 + 11.99% (mean + s.d.) of bone 
marrow cells gated solely on GFP could be categorized as LT-HSCs, 
26.61 + 9.86% as short-term HSCs (ST-HSCs) and 30.12 +12.8% as MPPs 
(Fig. 1a, Supplementary File 1). Within the phenotypic LT-HSC compart- 
ment, approximately 60% of cells expressed GFP, compared to 6% of 
MPPs (Fig. 1b). We refer to this mixed population as haematopoietic 
stem and progenitor cells (HSPCs). Notably, GFP was not expressed in 
non-haematopoietic compartments of the bone marrow (Extended 
Data Fig. 2b, c, Supplementary File 3). 

With the aim of eliminating the labelling of MPPs in the Mds1“"* 
model, we reasoned that the additional expression of a gene associ- 
ated with early differentiation could facilitate exclusive identifica- 
tion of LT-HSCs. Increased brightness of the reporter in phenotyical 
LT-HSCs was inversely correlated with the expression of F/t3, a gene 
whose expression has been associated with the loss of long-term self- 
renewal'*’ (Extended Data Fig. 2d). Taking advantage of the fact that 
the GFP coding sequences in the Mds1° allele are flanked by loxP sites 
(Extended Data Fig. la), we introduced a Fit3™ allele into our model 
(Extended Data Fig. 3a). This allele drives Cre-mediated recombina- 
tionin cells, beginning in the ST-HSC compartment (Extended Data 


“Oo 1 2 3 4 5 
Months after transplant 


projected in published scRNA dataset of HSCs and MPPs'*. Blue, LT- HSCs 
(n=789); red, ST-HCSs (n= 742); grey, other cells; bright green, MGF HSCs 
(n=46). Each dot represents one cell. e, Overall and granulocyte chimaerism 
after transplantation in primary lethally irradiated recipients transplanted 
with 25 MFG* (n= 6 mice) or SLAM cells (n=5 mice) from Mds1°"" Fit3™ mice. 
Each line represents an individual mouse. Only engrafted mice are shown. 


Fig. 3b). Characterization of Mds1“"* Fit3™ mice revealed an extremely 
rare GFP+ population (referred to as MFG cells) that corresponds to 
only 0.022 + 0.013% of the lineage-negative bone marrow (Fig. 1a, b). 
Remarkably, approximately 85% of cells gated solely on the basis of GFP 
resided in the phenotypically defined LT-HSC fraction (Fig. 1a, Extended 
Data Fig. 3e). Another 10% of MFG GFP* cells displayed slightly lower 
levels of CD150 and might be classified as ST-HSCs (Fig. 1a), and the 
other 5% represents CD150°CD48 cells that express lower levels of SCA1 
and are likely to be megakaryocyte progenitors (MkPs; Extended Data 
Fig. 3c, Supplementary File 1). MFG cells constituted only about 12% of 
the phenotypical LT-HSC population (Fig. 1b). The specificity of LT-HSC 
labelling in MFG mice was recapitulated in bone marrow from multi- 
ple locations (Extended Data Fig. 3d). MFG cells represented a largely 
quiescent population (Fig. 1c, Extended Data Fig. 3g) that express rela- 
tively high levels of SCA1 and EPCR and little or no CD34 (Extended 
Data Fig. 3e, f), consistent with previously described dormant HSCs". 

To further validate our combined Mds1“" Fit3“’ model, we per- 
formed single-cell RNA sequencing (scRNA-seq) in cells isolated 
exclusively on the basis of GFP expression”. The resulting transcrip- 
tomes were then extrapolated to a published single-cell transcrip- 
tional map of LT-HSCs, ST-HSCs, and multiple MPP populations 
(MPP2/3/4)'*. Strikingly, virtually all MFG* transcriptomes mapped 
to the most unprimed cluster of cells, in which phenotypic LT-HSCs 
also reside (Fig. 1d, Extended Data Fig. 4a). A small fraction of MFG* 
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Fig. 2 | Steady-state localization and oxygen levels around MFG-HSCs and 
HSPCs. a, Representative intravital images of HSPCs (left, n=8 mice) and an 
MFG-HSC (right, n=10 mice) in the calvaria of Mds1*"* and Mds1°"" Flt3“° 
mice, respectively. GFP cells (white arrows) are shown in green, vasculature 
(Angiosense 680EX) in red, auto-fluorescence in blue, and bone (second 
harmonic generation) in white. Scale bars, ~50 pm. b,c, Distance from each 
HSPC (n=13 and 29 cells from 3 and 4 mice for band c, respectively) and 


cells showed megakaryocyte lineage priming (Fig. 1d, Extended Data 
Fig. 3c), which has recently been described in multipotent HSCs’*””. 
This analysis also highlights the efficiency of our approach in restrict- 
ing GFP expression to MdsI‘FIt3" cells (Extended Data Fig. 4b). MFG 
cells expressed transcripts that were also enriched in dormant HSCs” 
(Extended Data Fig. 4c, d). In addition, single-cell quantitative PCR 
analysis of a280-gene haematopoietic gene panel”° demonstrated 
clustering of MFG cells with LT-HSCs but no other progenitor signatures 
(Extended Data Fig. 4e). Finally, we performed long-term reconstitu- 
tion assays to assess the potency of MFG cells in comparison to cells 
isolated using traditional flow cytometry markers for HSCs (LIN SCA- 
1‘C-KIT*CD150*CD48 , here referred to as SLAM cells) parameters. 
Limiting dilution transplants using 3-25 cells suggested that MFG- 
HSCs are at least as enriched as SLAM cells in transplantation capacity 
(Fig. le, Extended Data Fig. 4f). MFG cells also repopulated secondary 
recipients (Extended Data Fig. 4g). In addition, within the LIN SCA-1°C- 
KIT'CD150*CD48 compartment, long-term repopulating activity was 
enriched in cells expressing GFP (Extended Data Fig. 4h). Thus, our MFG 
animal modelallows the isolation of a highly quiescent sub-population 
of LT-HSCs with potent repopulation potential. 


Localization of MFG-HSCs in the calvaria 


Using these two reporter models, we performed imaging of GFP* cells 
inthe calvaria of live mice**. As expected, MDS1-GFP HSPCs were more 
prevalent than MFG-HSCs (Fig. 1a, b, 2a). Both cell types were located 
peri-vascularly at an average distance of less than 10 pm from the closest 
vessel (Fig. 2b, Supplementary Videos 1, 2). MFG-HSCs were also found 
similarly close to the endosteum (Fig. 2c), pointing to a possible dual 
endosteal-vascular niche, as suggested previously*”’*. However, we 
found that MFG-HSCs were almost exclusively associated with sinu- 
soids rather than arterioles (Fig. 2d). While HSPCs also predominantly 
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MFG-HSC (n= 20 and 24 cells from 6 and 8 mice for band c, respectively) to the 
nearest vessel and endosteal surface, respectively, are displayed. d, Identity of 
nearest vessel for each HSPC (n= 16 cells) and MFG-HSC (n=18 cells). e, Graph of 
in vivo oxygen measurements around individual HSPCs (n=2 mice, 7 cells) and 
MFG-HSCs (n=2 mice, 15 cells). Pvalues calculated using two-tailed unpaired 
t-tests; red bars, mean. 


localized close to sinusoids, a significant fraction of these were also near 
transition zone vessels (Fig. 2d) and their distance from the endosteum 
was more varied (Fig. 2c), suggesting that MFG-HSCs and downstream 
HSPCs occupy different micro-niches. 

Given the known developmental and structural differences between 
flat and long bones”, we also imaged femurs using a quantitative deep- 
imaging protocol”. We identified a very small number of GFP*c-Kit* 
HSCs in 250-m-thick, whole-bone femoral sections from MFG mice 
(Extended Data Fig. 5a—e). Approximately 70% of MFG-HSCs were 
located within 5 pm of sinusoidal CD105* cells, but this was not sta- 
tistically significant in comparison to random dots (Extended Data 
Fig. 5f).In addition, MFG-HSCs did not differ significantly from random 
spots in their distance to the endosteum (about 12% were within 10 pm 
and more than 50% were over 50 pm away; Extended Data Fig. 5b, d, f), 
underscoring the difference between the calvarium and the long bone, 
particularly the diaphysis. 


MFG-HSCsare not found in deep hypoxic zones 


Low oxygen tension (hypoxia) has been historically thought to be a 
shared niche characteristic that is critical for maintaining stem cell 
quiescence”. However, support for the existence of a hypoxic niche 
has largely come from indirect evidence and measurements lacking 
spatial resolution”®. Using an oxygen sensor and two-photon phospho- 
rescence lifetime microscopy”, we measured the local pO, surrounding 
individual HSPCs and MFG-HSCs in their native microenvironments 
(Extended Data Fig. 6a-f). First, we confirmed the overall hypoxic status 
of the calvarial bone marrow”, with intravascular pO, in the range of 
15-30 mm Hg (mean -23 mm Hg, about 3% O,) and extravascular pO, 
in the range of 10-25 mm Hg (mean -17 mm Hg, about 2% O,,; Fig. 2e). 
We then measured pO, around individual HSPCs and MFG-HSCs, and 
found similar oxygen levels (-18 and ~19 mm Hg) close to the average 
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Fig.3| Increased motility, expansion, and localization of activated MFG-HSC 
s.a, In vivo motility measurements of HSPCs (n=12 cells) and MFG-HSCs 

(n=16 cells) at steady-state over a2.5-h imaging period. Red bars, mean. Pvalue 
calculated using two-tailed Mann-Whitney test. b, Cell tracks for 16 MFG-HSCs 
over a2.5-himaging period. Images were acquired every 30 min. 

c, Representative intravital image of a Cy/GCSF-treated MFG mouse 4.5 days 
after the beginning of treatment. MFG cells (green), vasculature (red, 


extravascular pO, in the bone marrow (Fig. 2e). Thus, although we 
detected regions in the bone marrow with pO, as low as 10 mm Hg, 
HSPCs and MFG-HSCs were not found in these regions, suggesting that 
localization to the regions of deepest hypoxia is not a prerequisite for 
maintenance of MFG-HSC quiescence. 


Heterogeneous HSC response to activation 


We next examined the dynamic behaviours of HSPCs and LT-HSCs 
in their native niches. In vivo time-lapse imaging of the calvarium 
revealed that MFG-HSCs displayed low baseline motility whereas 
HSPCs showed enhanced motility (Fig. 3a, b, Supplementary Vid- 
eos 3-6). To assess whether HSC behaviours would be affected in the 
context of activation, we used a cyclophosphamide (Cy)/G-CSF proto- 
col that leads to expansion and subsequent mobilization of LT-HSCs”® 
(Extended Data Fig. 7a). Fluorescence-activated cell sorting (FACS) 
and imaging analysis demonstrated a tenfold increase in the num- 
ber of MFG cells after treatment (Fig. 3c—-e, Supplementary Video 7, 
Extended Data Fig. 7b, c). MFG cells are still enriched in the phenotypi- 
cal LT-HSC fraction in this activated state (Extended Data Fig. 7b) and 
display a concomitant increase in cell cycle activity (Extended Data 
Fig. 7d). Time-lapse imaging of treated animals for about 6h after the 
third dose of G-CSF (Fig. 3c, Supplementary Video 8) showed that, on 
average, MFG-HSCs became significantly more motile (P= 0.0001), 
with displacement measurements even higher than those of steady- 
state HSPCs (Extended Data Fig. 7e, Supplementary Video 8), but the 
response was heterogeneous, ranging from cells displaying limited 
displacement to a small number of cells that fully exited the bone 
marrowinto the blood stream (Supplementary Video 8, Extended Data 
Table 1). Similarly, following treatment with the myeloablative agent 


Angiosense 680EX), auto-fluorescence (blue). Arrows, GFP* cells. The 
experiment was performed four times with similar results. d,e, Graphical map 
of the locations of MFG-HSCs in the calvaria of untreated and Cy/GCSF-treated 
mice (n=3 and 4, respectively). Location data from individual mice are 
indicated by different colours. f, Identity of nearest vessel for each MFG‘ cell 
(n=12 cells) after treatment with Cy/GCSF. Compare to untreated mice in 

Fig. 2d. 


5-fluorouracil (5-FU), the number of MFG* cells increased (Extended 
Data Fig. 8a, Supplementary Videos 9, 11) anda subset exhibited higher 
motility, particularly on day 20 after treatment (Extended Data Fig. 8b, 
Supplementary Videos 10, 12). These data suggest that enhanced motil- 
ity isa common feature of the HSC response to injury, although we 
cannot rule out the possibility that the response is a result of indirect 
action on the niche by Cy/G-CSF or 5-FU. 

Activated MFG-HSCs were found, on average, to be further away 
from the endosteum than native MFG-HSCs (Extended Data Fig. 7f). 
Notably, they were even closer to the vasculature, with an average 
distance of about 1 pm (Extended Data Fig. 7g), and maintained their 
sinusoidal proximity (Fig. 3f). By assessing the distribution of acti- 
vated MFG-HSCs in the entire calvarial region, we identified unique 
patterns of HSC proliferation. First, native MFG-HSCs were found as 
rare single cells within the bone marrow, whereas activated MFG-HSCs 
appeared as clusters (Fig. 3c—e), suggesting that that MFG-HSC prolif- 
eration occurs within spatially restricted domains. Second, a subset 
of MFG* cells remained as single cells while others formed clusters in 
both the Cy/GCSF and 5-FU models (Fig. 3e, Extended Data Fig. 8c), 
suggesting that the proliferative response is heterogeneous among 
HSCs. To assess whether these clusters were clonal, we generated 
Mds1“®"Rosa26@r* mice” (Extended Data Fig. 8d-f). In untreated 
mice, labelled cells were usually found as rare single cells of differ- 
ent colours dispersed throughout the bone marrow (Extended Data 
Fig. 8g). Following treatment with Cy/GCSF, we observed labelled cell 
clusters made up predominantly of cells of a single colour (Extended 
Data Fig. 8g, h). Quantitative analysis confirmed that labelled nearby 
cells were more likely to have the same colour thana mixture of colours 
(Extended Data Fig. 8h), providing evidence for clonal HSPC prolifera- 
tion within confined physical domains. 
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Fig. 4 | Heterogeneity of bone remodelling stages governs expansion of 
MFG-HSCs (Mds1°""Flt3™ mice) and HSPCs (Mds1°"" mice). a, The double 
calcium staining strategy that identifies D-, M- and R-type cavities. Dye1, 
delivered 48 h before imaging, shows the old bone front that has been eroded 
to varying extents; dye 2, delivered before imaging, shows the new bone front. 
b-d, Expanded views, showing distinct cavity types defined by the dye 1:dye2 
pixel ratios. e, Asagittal section of bone marrow cavities containing Mds1°"* 
cells. f, Fractions of D-, M- and R-type cavities in the calvaria of non-treated or 


HSC expansion restricted by bone remodelling 


The observation of heterogeneous HSC proliferation in restricted 
physical domains prompted us to re-examine the characteristics of 
the microenvironment that either support clonal expansion or maintain 
cell quiescence. Recognizing that the bone is constantly undergoing 
remodelling, we hypothesized that the stages of bone turnover impose 
an additional degree of heterogeneity in the bone marrow microen- 
vironment that is not captured by the prevailing view centred on the 
endosteal versus perivascular duality. To visualize the stages of bone 
turnover, we administered two (spectrally distinct) calcium-binding 
dyes”? 48 h apart, and imaged the calvarium immediately after the 
second dye injection. The two dyes mark the positions of the old and 
new bone fronts, respectively, and reveal where the old bone front 
has been eroded (Fig. 4a). We quantified the ratio of the two dyes and 
classified the cavities as D-type (undergoing predominantly bone 
deposition), R-type (predominantly bone resorption), and M-type 
(mixed). We confirmed that osteoblasts are biased towards D-type cavi- 
ties while osteoclasts are biased towards R-type cavities. A mixture of 
osteoblasts and osteoclasts were found at intermediate levels in M-type 
cavities (Extended Data Fig. 9a—g). Using this double-staining scheme 
(Fig. 4b-e), we quantified the fractions of D-, M-, R- cavities in the cal- 
varia (Fig. 4f), as well as the spatial distributions of native MFG-HSCs 
and HSPCs. During the steady state, MFG-HSCs were found in baseline 
numbers inall cavity types, while HSPCs tended to be enriched in M-type 
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treated mice (two-tailed t-test, n=155 cavities from non-treated and 80 or 73 
bone marrow cavities from treated animals (n=3 mice); mean+s.d.). 

g, Quantification of MFG-HSCs in D-, M- or R-type cavities at steady-state and 
after Cy/GCSF activation. n=4 mice per group, plotted as different symbols; 
black line represents mean +s.d.h, Quantification of HSPCs in D-, M- or R-type 
cavities at steady-state and after Cy/GCSF activation. n=4 mice per group, 
plotted as different symbols. Two-sided Mann-Whitney test used unless 
otherwise specified, ****P< 0.0001, black line represents mean +s.d. 


cavities (Fig. 4g, h). After activation with Cy/GCSF, however, expanded 
MFG cells were found almost exclusively in a subset of M-type cavities 
(Fig. 4g, Supplementary Video 13, Extended Data Fig. 10a). HSPCs were 
also found to expand preferentially in M-type cavities after activation, 
although this preference was less pronounced (Fig. 4h, Supplementary 
Video 14, Extended Data Fig. 10b). This evidence of heterogeneity in 
types of bone marrow cavity, and of a subset of M-type cavities that 
favours HSC expansion, supports our earlier observation that HSCs 
expand clonally in restricted physical domains. 


Discussion 


Our work here describes the generation and characterization of an 
animal model in which a single-colour reporter can be used for the 
identification and live imaging of LT-HSCs in the native niche without 
transplantation (Extended Data Table 2). We found evidence of het- 
erogeneity in both the HSC response to injury and the bone marrow 
microenvironment, coupled to the stages of bone remodelling, that 
has not been recognized previously to our knowledge*”””. Notably, 
we also found distinct cavity types in the metaphyses of long bones 
(Extended Data Fig. 11a-f, Supplementary Video 15). The existence of 
distinct types of bone marrow cavity implies that the traditional way of 
characterizing the HSC niche as endosteal or perivascular is inadequate, 
as the microenvironment, including the perivascular niches, contained 
within these cavities is likely to differ depending on the local calcium 


gradient” and the downstream effects of osteoclast degradation®. To 
fully characterize the regulatory factors that govern HSC quiescence 
versus proliferation, it will be necessary to develop molecular profiling 
technology* that can spatially map distinct bone cavities. 
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Methods 


Mice and genotyping 

The generation of Mds1“"* mice was generated by cloning and homolo- 
gous recombination of the linearized targeting vector via electropora- 
tioninto v6.5 embryonic stem (ES) cells. After selection with neomycin 
and clonal screening by PCR, correctly targeted ES cell clones were 
injected into C57BI/6 blastocysts. Derived chimaeras were initially 
bred to C57BI/6 to obtain germline transmission, followed by crossing 
with FLPe mice® to remove the Frt-Neo-Frt cassette that was part of the 
original targeting vector. Derived mice were backcrossed onto a C57Bl/6 
background for more than six generations and mice were analysed via 
PCR to identify their genotype (’5- AGAGTGAAAGACCGAGTGTGTG-3’, 
’5- GFACAGGGTAGGCTGCTCAACT-3’, ’5- CTCCCTCCCAGCTTTT 
TGCT-3’). Some of the displayed data comes from mice that still car- 
ried the Frt-Neo-Frt cassette, which showed slightly lower mean fluo- 
rescence intensity in bone marrow cells. A similar strategy was used 
to generate the Mds1“* allele. For all experiments, 2-12-month-old 
adult mice of both sexes were used and wild-type littermates were used 
as controls. Fit3™ mice’, Rosa26-CAG-loxp-stop-loxp-tdTomato 
reporter mice® and Rosa26-CAG-loxp-stop-loxp-Confetti reporter 
mice” have been described. For identification of Flt3“ and Mds1°**, 
Cre primers were used (’5-TTACTGACCGTACACCAAAAT T TGCC-3’, ’5- C 
CTGGCAGCGATCGCTATTTTCCATG-3’). Mice were bred and housed 
according to NIH guidelines in our AAALAC-accredited, specific-path- 
ogen-free animal care facilities at Boston Children’s Hospital or Mas- 
sachusetts General Hospital. 2.3Collagenla1-GFP (2.3Col1-GFP) mice” 
were generously provided by Dr. Jayaraj Rajagopal (Massachusetts 
General Hospital). All animal protocols were approved by the Animal 
Resources at Children’s Hospital Boston, Boston Children’s Hospital 
Institutional Animal Care and Use Committee, and Massachusetts 
General Hospital Institutional Animal Care and Use Committee. All 
applicable international, national, and/or institutional guidelines for 
the care and use of animals were followed. All results involved in the 
study were acquired according to ethical standards. 


HSC isolation, flow cytometry and cell sorting 

Bone marrow cells were isolated by crushing of the bones using a mor- 
tar and pestle in Ca”*/Mg”*-free phosphate-buffered saline (D-PBS) 
supplemented with 2% fetal bovine serum (FBS) and 1x penicillin/ 
streptomycin (Pen/Strp) (Invitrogen). Viable cell number was calcu- 
lated by manually counting with a haemocytometer or using a TC20 
Automated Cell Counter (Bio-Rad). The cell suspension was filtered 
through a 70-um strainer. For HSC identification via flow cytometry, 
the cells were stained for C-KIT (eBioscience), SCA-1 (eBioscience, 
BioLegend), CD48 (BD Pharmingen) and CD150 (BioLegend) as well 
as a lineage marker cocktail consisting of B220, TER-119, GR-1, CD4 
and CD8a (eBioscience). For experiments requiring lineage deple- 
tion, antibody staining for B220, Ter119, GR-1, CD4 and CD8a biotin- 
conjugated antibodies was first performed followed by application 
of anti-biotin beads (Miltenyi Biotec) and depletion using magnetic 
separation columns (Miltenyi Biotec). For megakaryocyte progenitor 
staining, the cells were stained for lineage marker cocktail, C-KIT, SCA-1, 
CD150, CD41 (BioLegend) and FCyR (eBioscience). For identification 
of common myeloid progenitors (CMPs), granulocyte-monocyte pro- 
genitors (GMPs) and megakaryocyte-erythroid progenitors (MEPs), 
cells were stained for lineage marker cocktail, C-KIT, SCA-1, CD34 
(eBioscience) and FCyR. For mesenchymal stem cells, cell suspension 
was stained for the lineage cocktail, CD45, PDGFRa and integrin-aV 
(eBioscience). For endothelial cells, lineage cocktail, CD45, CD31 and 
VE-cadherin (eBioscience) were used. For identification of pre and pro 
B cells, immature B cells and mature B cells, B220 and IgM (eBiosci- 
ence) were used. For erythroid cells, the primary marker was Ter-119 
(eBioscience); for monocytes and neutrophils, MAC-1 (eBioscience) and 
LY6-G (BD Pharmingen) were used; and for T cells CD4 and CD8 were 


used. Antibody staining of cell suspensions was always performed on 
ice for 45 min. 4’,6-diamidino-2-phenylindole (DAPI, 10 pg/ml in PBS; 
Invitrogen) was used for exclusion of dead cells during flow cytometry. 
Relevant flow cytometry gating strategies for the identification of 
different mature cell populations, LT-HSCs, STHSCs, MPP2s, MPP3/4s 
and endothelial cells are available in the Supplementary Information. 
Transplanted cells were double-sorted to increase purity. For transplan- 
tation of lowcell numbers, all secondary sorts were performed ina plate 
using the automated plate reader sorting function. For FACS analysis, 
aBDLSRII Flow Cytometer was used, while cell sorting was performed 
using a BD FACSAria II sorter (BD Biosciences). Flow cytometry data 
were analysed using FlowJo (Tree Star). 


Cell cycle analysis 

As each animal contained an average of only 600-700 MFG sorted 
cells, the cell cycle analysis demonstrated in Fig. 1c and Extended Data 
Fig. 3g represent GFP cells isolated from seven Mds1°" Fit3™ mice; 
thus, the data represent an average from seven mice that were pooled 
together. Upon identification and sorting purification of correspond- 
ing cellular populations as described above, cells were fixed in ice cold 
70% ethanol. Cells were then washed and stained with Ki67 (BioLegend) 
for 30 min onice to distinguish GO/G1 phase. DAPI was finally used for 
staining and analysis of G1/G2 versus M/S phase. Cell cycle analysis was 
performed using a BD FACSAria II sorter. 


Competitive reconstitution assays in irradiated mice and 
peripheral blood analysis 

Bone marrowtransplant recipients were 8-12-week-old B6.SJL-Ptprc*™ 
Pepc?/Boy] (CD45.1) mice. Before transplantation, mice were lethally 
irradiated using a gamma irradiator with a split dose of 11 Gy witha 
3-h interval between the two doses. Cells were transplanted via retro- 
orbital injection into anaesthetized mice. One hundred thousand whole 
bone marrow CD45.1 cells were used as competitors unless stated oth- 
erwise. For limiting dilution studies, HSC frequency was calculated 
using Extreme Limiting Dilution Analysis software (http://bioinf.wehi. 
edu.au/software/elda/)** with data taken 4 months after transplanta- 
tion. The lower stem cell frequency reported here might be due to the 
incomplete backcrossing of MFG mice (six generations), the presence 
of constitutive CRE in haematopoietic cells and/or technical reasons. 
For secondary transplants, two million whole bone marrow cells from 
primary recipients were transplanted into lethally irradiated secondary 
recipients. For blood analysis of transplanted recipients, blood was 
collected at 4-week intervals for at least 16 weeks after transplanta- 
tion. Peripheral blood was first treated with red blood cell lysis buffer 
to remove red blood cells followed by antibody staining for B cells 
(CD19, eBioscience), T cells (CD4, CD8a) and granulocytes (Ly6-G). 
The percentage chimaerism was estimated using CD45.1 (BioLegend) 
and CD45.2 (eBioscience) antibody staining. 


Blood cell counts and treatment with 5-FU, Cy/GCSF and 
tamoxifen 

Blood samples were collected via the retro-orbital vein in EDTA-coated 
tubes. Blood cell counts were performed using a HEMAVET950 (Drew 
Scientific) cell blood counter. For blood cell kinetic analysis upon 5-FU 
treatment, cell counts were performed on day O, 3, 7,10, 13 and 17. 5-FU 
was delivered via retro-orbital injection as single dose of 150 mg/kg 
immediately after day 0 and a bleeding sample was collected while con- 
trol mice were injected with PBS. In addition, bone marrow from treated 
mice was analysed using flow cytometry or imaging was performed 
at the indicated time points. For Cy/GCSF experiments, cyclophos- 
phamide was delivered via intraperitoneal injection as a single dose 
of 200 mg/kg on day 1, followed by subcutaneous injection of GCSF 
on days 2, 3 and 4 at 250 pg/kg per day followed by bone marrow flow 
cytometry analysis or live animal imaging of the calvarial bone marrow. 
For Mds1“£** Rosa26@"/* mouse experiments, cyclophosphamide 


or PBS was administrated on day 1 followed by GCSF and tamoxifen 
injection on day 2 and further GCSF or PBS administration on days 3 
and 4 according to each experiment. Tamoxifen was administrated at 
a2 mg dose via intraperitoneal injection to target labelling to the LT- 
HSC compartment. Bone marrow analysis of femurs and calvaria was 
performed 24 h after the final GCSF dose according to each experiment. 


Single cell inDrops RNA sequencing 

GFP cells were sorted from Mds1° Fit3™ mice and single cells were 
encapsulated using droplet microfluidic technology as previously 
described”. Although 1,200 GFP* cells were sorted, only about 400 
cells were encapsulated; the loss of more than half of the population 
is standard for low cell number populations in the inDrop platform. 
Upon library preparation of barcoded single cells, RNA sequencing 
was performed. To process the data, we used a previously published 
workflow and code available at https://github.com/AllonKleinLab/ 
SPRING. Ensembl release 81 mouse mm10 cDNA plus the sequence of 
loxP-IRES-GFP-polyA-loxP was used as reference. SPRING plots were 
generated using the next four-step process. Initially, cells with few 
mRNA counts (<1,000 unique molecular identifiers) and stressed cells 
(mitochondrial gene-set Z-score >1) were filtered out. The remaining 
high-quality cells were total-counts normalized. We next filtered genes, 
keeping those that were well detected (mean expression >0.05) and 
highly variable (CV >2). Finally, the data were normalized by Z-scoring 
each gene and applying principal components analysis (PCA), retain- 
ing the top 50 PCs. Following filtration and bioinformatics analysis, 
only 50 GFP* cells passed quality control and were used for plotting. 
The data acquired from the GFP’ cells were then plotted with previ- 
ously published data for LSK cells'* upon transformation in the PCA 
space of the previously published data. In brief, the two datasets were 
integrated using the library sklearn.decomposition.PCA (python 2.7). 
The fit function was used to calculate the first 50 PCs for the single cell 
LSK dataset'®. Then, the normalized and filtered count matrices of the 
GFP and LSK cells were vertically combined. Before combining the two 
matrices, the GFP matrix was scaled in order to have a comparable 
amount of normalized counts in correlation to the LSK matrix. The 
resulting Z-score-combined matrix was used as input for the transform 
function to project the combined dataset onto the original LSK dataset. 
The output generated by the transform function and the correspond- 
ing distance matrix, which was obtained using the SPRING function 
get_distance_matrix, were used to generate the final SPRING plot. All 
data were visualized as previously described’. This reduced any batch 
effects between the two experiments. The coordinates generated by 
the SPRING plots were plotted using R. 


Single cell fluidigm analysis 

GFP cells were primarily sorted from Mds1°™* Fit3“ mice followed 
by secondary single cell sorting directly in 96-well plates containing 
PCR buffer. Sorted plates were frozen on dry ice followed by reverse 
transcription, pre-amplification and high-throughput microfluidic 
real-time PCR for 180 transcription factors as previously described”°. 
Data analysis and hierarchical clustering were performed using Mul- 
tiExperiment Viewer (MeV) program. Previously published data for 
CMPs, GMPs, MEPs, common lymphoid progenitors (CLPs), MPPs and 
LT-HSCs” using the same 180 real-time PCR platform were overlaid for 
comparison to the GFP cells. 


Synthesis and characterization of phosphorescent Oxyphor 
PtG4 probe 

The structure of Oxyphor PtG4 is almost identical to that of the previ- 
ously published Oxyphor PdG4 probe”. The synthesis of the core por- 
phyrin and the synthesis of a dendritic probe similar to Oxyphor PtG4 
have been published previously”. All synthesis steps were identical 
to those developed for the synthesis of PdG4*°. Matrix-assisted laser 
desorption/ionization time of flight (MALDI-TOF) mass spectrometry 


was used to confirm the identity of the intermediate products and of 
the target probe molecule. Calibration, assessment of phosphores- 
cence oxygen quenching and absorption spectra measurements for 
the Oxyphor PtG4 probe were performed as previously described?" 


In vivo and ex vivo imaging 

Allin vivo imaging experiments were performed according to proce- 
dures approved by the Institutional Animal Care and Use Committee 
at Massachusetts General Hospital. In brief, mice were either anaesthe- 
tized with an induction dose of 3-4% isoflurane (96% O,) and a main- 
tenance dose of 1.25-2% isoflurane or given an intraperitoneal bolus 
injection of ketamine (100 mg/kg) and xylazine (15 mg/kg). Animals 
were deemed anaesthetized by the toe pinch method. To minimize 
pain, mice were treated with buprenorphine (0.05-0.1 mg/kg). The hair 
onthe calvarium was removed with scissors or a mechanical trimmer 
and then the skin was wiped with alcohol. Next, a calvarial skin flap 
was created with a U-shaped incision to reveal the underlying calvarial 
boneas previously described””. Mice were injected with imaging agents 
(for example, vascular labels) retro-orbitally, mounted in a custom- 
designed heated mouse holder, and secured to the stage of a home- 
built multiphoton/confocal laser-scanning video-rate microscope (for 
z-stack or time-lapse imaging) or an Olympus FVMPE-RS multiphoton 
imaging platform (for oxygenation measurements)”. A drop of 0.9% 
saline was applied to the skull to act as the immersion fluid, and a Zeiss 
63x 1.15 numerical aperture water-dipping objective, an Olympus 60x 
1.0 numerical aperture water-dipping objective, or an Olympus 25x 1.05 
numerical aperture water-dipping objective was used for all imaging. 
For endpoint imaging, mice were killed while under anaesthesia using 
approved procedures. For survival imaging, the skin flap was closed 
with 6-0 vinyl sutures (Ethicon). Triple antibiotic ointment (bacitracin, 
neomycin, and polymyxin-B sulfate) was applied to the top of the surgi- 
cal site to minimize the chance of infection. Mice were put ina heated 
cage and monitored until fully awake. For 6-h imaging sessions, mice 
were given an intraperitoneal injection of ~100 pl 0.90% saline solution 
every hour to ensure proper hydration. 

GFP was excited at 491 nm (confocal) or 950 nm (two-photon) and 
collected at ~505-540 nm using a photomultiplier tube. Angiosense 
680EX (~100 pl at 2nmol/100 pl, Perkin-Elmer) for labelling the vascu- 
lature was excited at 635 nm (confocal only) and collected at ~665-725 
using a photomultiplier tube. Autofluorescence generated from the 491 
nm or 950 nm excitations was collected at ~570-610 nm using a photo- 
multiplier tube. Second harmonic generation (SHG) from collagen in 
the bone was excited at 775 nm or 840 nm (two-photon only) and col- 
lected at -340-460 nm witha photo-multiplier tube. Phosphorescence 
was excited at 1,150 nm by a Ti:Sa femtosecond laser (Insight, Spectra- 
Physics) and collected above 750 nm with a photomultiplier tube. For 
calcium staining of endosteal bone fronts, calcein blue (30 mg/kg), 
tetracycline (35 mg/kg, Sigma), and Alizarin red S (40 mg/kg), were 
excited at 775 nm and collected at 415-455 nm, 500-550 nm, and 
580-650 nm, respectively. Rhodamine B dextran 70 kDa (0.5 mg/50 pl, 
Sigma) was used asa vascular contrast with Cat K 680 FAST (2nmol/100 pl 
injected 6 h before imaging, Perkin-Elmer) for labelling osteoclasts, 
excitedsimultaneouslyat532nmand638nmandcollectedat570-620nm 
and 665-745 nm. 

For steady-state in vivo imaging, 15-60 frames from the live video 
mode were averaged to acquire single 500 x 500 pixel images. Z-stacks 
were acquired with 1-2-t1m steps and time-lapse images were acquired 
at 30-s intervals for 20 min or longer. For Cy/GCSF in vivo imaging, 
z-stacks were acquired with 2-m steps every 20 min for ~6 h. Calvarial 
celllocation maps in Fig. 3d, e were created in Matlab using custom code 
based on thex, y, and zcoordinates of each cell. Data from each mouse 
were aligned and then overlayed using the locations of the coronal and 
sagittal sutures. 

For pO, measurements, ~75 pl of 1.7 mM Pt-G4 suspended in 0.9% 
PBS (1x PBS, Invitrogen) was injected intravenously before imaging. 
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Ineach pO, measurement location, multiple pulse excitation/emission 
cycles were used to record the phophorescence lifetime decay. For each 
cycle, the probe was excited for ~10-20 ps followed by ~150 us for time- 
resolved photon collection. Quantitative pO, values were obtained by 
fitting the phosphorescence decay with a single exponential to get an 
average lifetime of phosphorescence. This lifetime value was converted 
to pO, using an in vitro calibration curve for the same batch of Pt-G4. 

For 5-FU imaging experiments, 5-FU was delivered via retro-orbital 
injectionas a single dose of 150 mg/kg as described above. In vivo bone 
marrow two-photon imaging was then performed on day 4 or day 20 
after the 5-FU injection. The calvarial cell location map in Extended 
Data Fig. 8c was made in similar way to the Cy/GCSF cell maps described 
above. 

For Cy/GCSF ex vivo imaging, freshly fixed (4% paraformaldehyde) 
and excised mouse calvaria were affixed to a plastic dish, immersed in 
1x PBS, and immediately imaged for 4-5 h. Tiled z-stacks were acquired 
with 3-pum steps and a10% overlap between fields of view. Images were 
stitched together in 3D using Olympus FluoView software or Image) 
scripts. 

For examining bone remodelling activities in calvaria (in vivo) and 
metaphysis (ex vivo), calcium binding dyes were administered 48 h 
apart via retro-orbital injection. Calvarial in vivo imaging was per- 
formed as described. Mouse tibia was freshly removed, thinned, and 
imaged from the bone surface. Tiled z-stacks were acquired with 3-um 
steps and stitched using Image). 


Image quantification 

For distance measurements, the distance from each cell to blood vessels 
or tothe nearest bone surface (that is, endosteum identified using SHG) 
was computed by handas described previously using the Pythagorean 
theorem”. The bone contains abundant collagen, which enabled us 
to use SHG imaging to identify the inner bone surface (endosteum). 
This technique has been used in many previous publications of live 
bone marrow imaging??”7**, 

The identity of blood vessels (arterioles, sinusoids, or transition 
vessels) within the calvaria was determined by a combination of mor- 
phology, location within the vessel network, location within the bone 
marrow, and blood flow. In brief, with our blood pool agent the arte- 
rioles appear as narrow (~5-10 ppm diameter) and generally straight 
vessels with a smooth surface upstream of sinusoidal vessels, which 
are larger (~20 pm diameter or greater) with irregular surfaces. This 
definition was based on previous work?”“4, which confirmed that these 
small diameter vessels are arterioles with a faster flow speed (-2 mm/s 
or higher), higher pO,, and increased barrier function in comparison to 
sinusoidal vessels. They also stain positive for SCA1. At the transition 
point between arterioles and sinusoids, the vessel diameter increases. 
Itis from this point of increase to the next vessel branching point down- 
stream that we define as transitional vessels. 

Distance measurements were performed in Image] v.1.51p. For display 
purposes, the brightness and contrast of images in the figures were 
adjusted, but all image analysis was performed on raw data. For motil- 
ity measurements, frame-to-frame drift was corrected in 3D using the 
Template Matching plugin in ImageJ. Next, the centroid of the cell was 
determined for the first and last image of a20-min sequence, and the 
2D displacement was calculated using the distance formula. 

For cell clustering analysis, the individual tiled z-stack images 
were reconstructed into a single z-stack for the whole calvaria using 
Image]. Next, each cell was designated as one of three tags (red, green, 
or blue) based on the colour of the cell during imaging, and thex, y, 
zcoordinates were recorded by hand. Using a custom Matlab script 
similar to ClusterQuant*, we analysed the spatial clustering (cluster 
size = 3) of like-coloured cells in this model compared to 10,000 
randomized samples to determine the statistical likelihood of the 
colour clustering in our samples. Graphs and statistical analyses were 
performed using Graph Pad Prism version 6 or higher. The contrast 


and/or brightness of figure images and videos were adjusted for 
display purposes only. 


Classification of bone marrow cavities 

A bone marrow cavity is defined as a 3D inclusion inside bone witha 
single concave endosteum (Fig. 4e, Extended Data Fig. 11d-f), while 
deeper down all cavities are interconnected. Once a cavity has been 
defined using the bone SHG signal, we classified types of bone marrow 
cavity by sequential staining with two calcium-binding reagents. The 
first calcium-binding dye (dye 1, tetracycline or calcein blue, Sigma; 
35 mg/kg and 30 mg/kg, respectively) was administered 48 h before 
imaging to track bone resorption activities based on erosion of dye1, 
and the second calcium-binding dye (dye 2, Alizarin red, 40 mg/kg) 
was administered 30 min before imaging to label high-calcium regions 
(bone fronts). The 48-h interval was chosen on the basis of the esti- 
mated lifespan of mouse active osteoclasts***”. Therefore, the double- 
staining approach delineated approximately one bone erosion cycle 
in the bone marrow. As the lack of dye 1 indicates the existence of 
resorption whereas strong double staining with both dyes indicates 
ongoing bone deposition, the dye 1:dye 2 ratio contained within a single 
concave endosteum depicts the status of bone remodelling during 
the 48-h period. For each cavity, the acquired depth covered the dye 
1- and dye 2-labelled regions, typically between 80 and 120 pm beneath 
the endosteum. For quantification of osteoblast or osteoclast cover- 
age (2.3Col1GFP or cathepsin K pixels) along the endosteum, z-stack 
double staining, COLI, and cathepsin K images of 2-m z-steps were 
rendered in 2D using maximum intensity projection in ImageJ and then 
analysed (Extended Data Fig. 9). As cathepsin K is also expressed sub- 
stantially by endothelial cells, a vascular map (rhodamine B dextran) 
was acquired simultaneously and subtracted from the cathepsin K 
map before we retrieved the total pixel counts. For quantifying frac- 
tions of cavity types, 3D maps of calvaria were acquired and rendered 
in 2D using maximum intensity projection, then analysed (Fig. 4f). 
For quantification of cavity types for Fig. 4g, h, total pixels of dye1 
and dye 2 were retrieved directly from the 3D stacks. Segmentation 
of dye 1 and dye 2 in each stack was obtained using ImageJ macros 
combining multiple built-in plugins. Specifically, contrast enhance- 
ment was applied consistently (0.1% saturation) for each stack. The 
images were smoothed using 3D image] suite plugins*® (3D mean filter, 
kernel size = 1) followed by background subtraction using the rolling 
ball algorithm with radius size of 100 and 250 pixels for dye 1 and dye 
2, respectively. This background subtraction step removed diffuse 
signals from bone autofluorescence that are more prominent in blue 
and green channels (dye 1) but still distinct from structured patterns of 
bone front staining. Segmentation was then performed using ImageJ 
built-in global or local thresholding algorithms to render matching 
binary results compared to raw stacks. The total numbers of dye land 
dye 2 pixels were then obtained from the binary images to calculate 
the dye 1:dye 2 ratio. 

For classification purposes, we defined bone cavities as (i) deposition 
type (D-type; dye 1:dye 2 > 75%); (ii) resorption type (R-type; dye l:dye 
2<25%), and (iii) mixed type (M-type; dye 1:dye 2 25-75%. These defini- 
tions emphasize functional perspectives of bone remodelling along the 
endosteum (dominated by bone deposition or resorption), instead of 
the presence of osteoblasts or osteoclasts at the time of imaging. This 
is especially important given that osteoclasts went through apoptosis 
after each resorption cycle and may not be present at the time of imag- 
ing. Of note, bone-lining cells have been reported to occupy the bone 
fronts of inactive regions that lack both mature osteoblasts and cal- 
cium staining”. Inthe calvarium, as neither MDS-HSPCs nor MFG-HSCs 
were found in fully inactive regions, we characterized cell distribution 
only in D,M, and R cavities, where small patches of inactive areas may 
be present but do not alter the distribution of the three cavity types. 
To quantify the number of cells per bone marrow cavity (Fig. 4g, h), 
Mds1°" Flt3© and Mds1°* cells were manually counted. A cell was 


considered to belong toa cavity if it was underneath a concave dome; 
there was no restriction on its distance from the endosteal surface. 


Bone clearing and imaging of femurs 

Tissue preparation, multicolour full-bone imaging of thick femoral 
sections and quantitative analysis were performed as previously 
described”. In brief, bones were fixed for 18 hin 4% paraformaldehyde 
before being decalcified using 10% EDTA (EDTA, pH = 8) for two weeks. 
Longitudinal bone sections (250 um thick) were blocked, permeabilized 
(followed by additional blocking of endogenous avidins and biotins) 
and stained overnight at room temperature with primary antibod- 
ies (anti-GFP (chicken, Aves Labs, GFP-1020), anti-CD117 (goat, R&D, 
AF1356), anti-CD105 (rat, eBioscience, 14-1051-82) and anti-collagen 
type I (rabbit, Cedarlane, CLSO151AP)), secondary antibodies (Alexa 
Fluor 555, 680, and CF633) and DAPI (Thermo Fischer Scientific, D1306). 
The GFP signal was amplified using donkey anti-chicken biotin (Jackson 
Immunoresearch, 703-065-155) followed by streptavidin Alexa 488 
(Thermo Fischer Scientific, S32354). Full-bone scans were performed 
using a Leica TCS SP8 confocal microscope equipped with three photo- 
multiplier tubes and two HyD detectors using type F immersion liquid 
(RI: 1.518) and a 20x multiple immersion lens (NA 0.75, FWD 0.680 mm). 
Images were acquired at 8-bit, 400 Hz and 1,024 x 1,024 resolution with 
2.49 um z-spacing. Segmentation and distance quantification analysis 
were performed with Imaris (version 8.3.1), using the XT and Distance 
Transformation XTension modules. To avoid data truncation, data 
were transformed to 16 bit before distance quantification and then 
reverted back to 8 bit. Random dots were generated via a Matlab-based, 
self-developed software (XiT) as previously described”. Graphs and 
statistical analyses were performed using Graph Pad Prism version 6. 


Reporting summary 
Further information on research design is available in the Nature 
Research Reporting Summary linked to this paper. 


Data availability 


The GEO accession number for GFP cells is GSE115908. The GEO acces- 
sion number for LSK cells used for overlay has been published previ- 
ously (GSE90742)’8. 
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Extended Data Fig. 4 | Additional characterization of MFG-HSCs. 

a, b, InDrops scRNA-seq analysis of MFG* cells in comparison to multiple 
populations of HSC and MPPs. MFG cells (46 cells) are predominantly foundin 
areas where Mecom (purple, n= 742 cells), but not F/t3 (orange, n=1,111 cells) is 
expressed. Teal, MPP2; purple, MPP3; light green, MPP4; grey, other cells; 
bright green, Mds1°""Fit3™ cells. Gradient colour demonstrates normalized 
read counts. Each dot represents an individual cell. MFG-HSCs represent cells 
froma single mouse, the rest of the cells represent cells from a separate single 
mouse. c, d, Heatmaps (c) and spring plot map (d) showing expression levels of 
previously published ‘dormant’ HSC genes in scRNA-seq data from LTHSC and 
MFG cell populations. For the spring plot analysis: MFG, n= 46 cells; 
CD34,n=2,380 cells (teal); each dot represents an individual cell. MFG-HSCs 
represent cells froma single mouse; the rest of the cells represent cells froma 


separate single mouse. e, Single-cell transcriptional fluidigm profile of MFG- 
HSCs demonstrates that they cluster together with LT-HSCs. f, Summary of 
transplants with 3, 7, or 15 MFG or SLAM HSCs together with 100,000 bone 
marrow cells, analysed 4 months after transplantation. HSC frequencies were 
calculated using ELDA software (see Methods). g, Engraftment analysis 
following secondary transplantations using whole bone marrow from one 
primary recipient of 25 MFG* HSCs. Experiment shown is representative of 
three independent experiments. h, Percentage chimaerism at 4, 8,12, 16 and 20 
weeks in primary recipients transplanted with 25 SLAM cells sorted on the basis 
of GFP expression isolated from Mds1“"""Flt3™ mice (n=12 GFP” mice, n=5 
GFP* mice). Our data demonstrate that GFP* cells within the SLAM 
compartmentare more functionally enriched. Eachline represents an 
individual mouse. 
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Extended Data Fig. 5 | See next page for caption. 
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Extended Data Fig. 5 | Multicolour quantitative deep-tissue confocal 
imaging of complete femoral sections from MFG (Mds1°""FIt3“) mice. 

a, Identification of C-KIT'GFP*MFG-HSCs using multicolour quantitative deep- 
tissue confocal imaging of full bone femoral sections. Pictures are 10-umxy 
projections of one area of interest. n=3 mice. The experiment was performed 
three times with similar results. b, Example of one full-bone femoral section 
with colour-coded visualization of HSCs based on their distance from bone. 
Yellow squares represent individual HSCs in proximity to cortical or trabecular 
bone, whereas green dots represent individual HSCs located more than10 um 
away. The picture represents data from an individual mouse. The experiment 
was performed three times with similar results (d).c, Example of full-bone 
femoral section (only Col.1and DAPI staining are shown). The experiment was 
performed three times with similar results. d, Colour-coded visualization of 
HSCs based on their distance to bone. Yellow squares represent individual 


HSCs in proximity to cortical or trabecular bone, whereas green dots represent 
individual HSCs located more than 10 ppm away. This picture represents an 
independent mouse from b. The experiment was performed three times with 
similar results. e, Quantification of absolute number and anatomical location 
of C-KIT'GFP*MFG-HSCs per individual experiment. (N=3 mice) f, Spatial 
distribution of HSCs (circles) and random dots (triangles) relative to Col.1 
marking bone surfaces (left panel) and CD105* vasculature (sinusoids, right 
panel) (n=3 mice). Pvalues were calculated using two-tailed Kolmogorov- 
Smirnov (distance distributions, left panel P= 0.1516, right panel P> 0.9999) 
and one-tailed Mann-Whitney (first bin of histograms, left, HSCs: 8.56 + 5.74, 
RDs: 6.88 +1.94, P=0.50; right, HSCs: 67.52 + 10.99, random dots: 68.53 +3.51, 
P=0.35) tests. Data points with mean +s.d. (red for HSCs, blue for random 
dots). NS, not significant. Epi: epiphysis, meta: metaphysis, dia: diaphysis. 
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Extended Data Fig. 6 | See next page for caption. 
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Extended Data Fig. 6 | Synthesis, structure and characterization of 
phosphorescent probe Oxyphor PtG4. The structure of Oxyphor PtG4 is 
almost identical to that of the previously published probe Oxyphor PdG4”, but 
it contains Pt instead of Pd at the core of the porphyrin (1: Pt tetra-meso-3,5- 
dicarboxyphenyl-tetrabenzoporphyrin). a, Synthesis of Oxyphor PtG4. First, 
eight carboxyl groups on the porphyrin 1 were amended with 4-amino- 
ethylbutyrate linkers. Upon hydrolysis of the terminal esters in the resulting 
porphyrin 2, eight aryl-glycine dendrons (H,N-AG?(OBu),) were coupled to the 
resulting porphyrin-octacarboxylic acid, giving dendrimer 3. The butyl esters 
onthe latter were hydrolysed under mild basic conditions, andthe 

resulting free carboxylic acid groups were amidated with mono- 
methoxypolyethyleneglycol amine (MeO-PEG-NH,, AVMW 1000), giving the 
target probe Oxyphor PtG4. MALDI-TOF (m/z) was used to confirm the identity 
of the intermediate products as well as of the target probe molecule. Structure 
2 (Cy6Hi4Ny02,4Pt, calculated at MW 2,263.85) was found 2,264.48 [M]*; 
structure 3 (Cy¢sHs4oN6oO0p0Pt, calculated at MW 9,114.76) was found at 9,115.68 
[M+H]* and Oxyphor PtG4 (Cy7g9H319¢No20792Pt, calculated at MW 40,538) was 
found at 35,952. For Oxyphor PtG4 we identified an additional peak at MW 
66,123.6 which is probably due to the presence of dimeric species formed 
during the ionization process. b, Linear (one photon) absorption (green) and 
emission spectra (red) of PtG4 in50 mM phosphate buffer solution (pH 7.2, 

A. = 623 nm; photophysical constants in PBS, 22 °C: e(623) - 90,000 M?cm™ 


(molar extinction coefficient), @pn0s(deox) ~ 0.07 (phosphorescence quantum 
yield in deoxygenated solution), T,,,=16 ps (phosphorescence decay time on 
air), tgeo, = 47 ms (phosphorescence decay time in deoxygenated solution). 

c, Phosphorescence oxygen quenching plot of Oxyphor PtG4. The calibration 
was performed as previously described®’. The experimental points were fitted 
to an arbitrary double-exponential form and the obtained parametric equation 
was used to convert the phosphorescence lifetimes obtained inin vivo 
experiments to pO, values. d, Two-photon absorption spectrum of PtG4 in 
deoxygenated dimethylacetamide (DMA, 22 °C). e, Arbitrarily scaled one- 
(green line) and two-photon (blue line) absorption spectra of PtG4. The two- 
photon absorption (2PA) spectra of PtG4 and of the reference compounds were 
measured by the relative phosphorescence method, as previously described*. 
The laser source wasa Ti:Sapphire oscillator (80 MHz rep. rate) with tunability 
range of 680-1,300 nm (Insight Deep See, Spectra Physics). All optical 
spectroscopic experiments and oxygen titrations were performedat least 
three times, giving highly reproducible results. f, Representative intravital 
images of an HSPC (green, left image), MFG-HSC (green, right image), 
vasculature (grey, Rhodamine-B-dextran 70 kDa), and autofluorescence (blue) 
overlaid with localized oxygenation measurements. White arrows, GFP cells. 
Black arrow, colour representing 10 mm Hg. Coloured squares represent 
individual localized oxygen measurement areas. Images represent data from 
two independent experiments for each mouse model. Scale bars, ~50 pm. 
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Extended Data Fig. 7 | Increased motility and expansion of activated MFG- 
HSCs. a, Schematic illustration of protocol for activating bone marrow HSCs 
using Cy/GCSF. b, Flow cytometry analysis of Cy/GCSF-treated MFG mice (n=3 
mice). Datashow Lineage’ cells. Mean+s.d.c, Number of GFP’ cells identified 
per calvariain untreated and Cy/GCSF-treated Mds1°""Fit3™ mice (n=5 and 4 
mice, respectively). Red bars indicate mean. Pwas calculated using two-tailed 
Mann-Whitney test. d, Cell cycle analysis of MFG’ cells from Cy/GCSF-treated 
mice. Three mice were pooled together to acquire the displayed data. e, Graph 


DAPI 


Distance to vasculature (11m) 


showing in vivo motility measurements of HSPCs (n= 66 cells) and MFG-HSCs 
(n=30 cells) at steady-state and of activated MFG-HSCs (n=142 cells) inthe 
calvaria. Red bars indicate mean. Pwere calculated using two-tailed Mann- 
Whitney test. f, g, Distance from MFG‘ cells to the endosteum (n=24 and12 
cells for untreated and Cy/GCSF-treated, respectively) and to the nearest vessel 
(n=20 and 17 cells for untreated and Cy/GCSF-treated, respectively), after 
treatment with Cy/GCSF. Red bars indicate mean. Pvalues calculated using two- 
tailed unpaired t-test. 
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Extended Data Fig. 8 | See next page for caption. 


Extended Data Fig. 8 | Characterization of MFG-HSCs upon activation. a, 
Bone marrow analysis of HSPC (Mds1°"”) PBS control (n=1 mouse) and HSPC 
(Mds1**") 5-FU-treated mice (n=2 mice, value represents mean), 17 days after 
treatment. Data show marked expansion of HSPCs even after recovery of blood 
(Extended Data Fig. le). b, Graph showing in vivo motility measurements of 
MFG-HSCsat days 4 (n=14 cells) and 20 (n=13 cells) after 5-FU treatment. Red 
bar represents mean. Compare to untreated Mds1°""Fit3™ mice in Fig. 3a and 
Extended Data Fig. 7e. Pwas calculated using two-tailed Mann-Whitney test. 
c, Representative map of the locations of MFG-HSCs in the calvaria on day 20 
after 5-FU treatment (n=2 mice). Scale barm ~500 ppm. d, Generation of 
Mds1°®*Rosa26@"* mice. e, Schematic illustration of Cy/GCSF treatment 
protocol for multicoloured Mds1°**Rosa26@"* labelling and activation. 
Low tamoxifen dosage (2 mg) was used to restrict recombination and 
expression of fluorescence in LT-HSCs that express higher levels of Mds1. 


f, Detailed flow cytometry analysis of MPP3/4 cells, ST-HSCs and LT-HSCs with 
differential colour labelling upon treatment of Mds1“®*Rosa26@""* mice 
shows labelling enriched in but not fully restricted to LT-HSCs. The experiment 
was performed once. g, 2D maps of the 3D locations of activated and labelled 
HSPCs in the fixed calvaria of control (left top, tamoxifen only, n=2 mice) and 
induced mice (left bottom, tamoxifen + Cy/GCSF, n=3 mice) along with 
maximum intensity projection (MIP) images (right top and bottom) of the 
MdsiI-labelled cells (red, green, and blue). Scale bars for graphical map and MIP 
images, ~200 pmand 50 um, respectively. h, Colour purity of cell clusters 
(original colours) compared to randomized colours (10,000 cycles) in three 
independent experiments (n=3 mice). Pvalues calculated using two-tailed 
unpaired ¢-test. Bar graphs with error bars represent mean ands.d., 
respectively. 
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Extended Data Fig. 9 | Validating bone cavity types using 2.3Col1-GFP 
(mature osteoblasts) and cathepsin K-activated fluorescent agent 
(osteoclasts). a, A montage of multiple z-stacks, displayed as the maximum 
intensity projection, showing double staining of bone marrow cavities in the 
calvarium. b, The same areaas ina, showing the locations of 2.3Col1-GFP 
osteoblasts in areas of the old bone front that has not been eroded (n=3 mice). 
c, Quantification of 2.3Coll-GFP pixels in D-type (n=10 regions), M-type (n=16 
regions) and R-type cavities (n=18 regions). Mean+s.d.d, Amontage of 
multiple z-stacks, displayed as the maximum intensity projection, showing the 
double staining pattern (blue and red), 2.3Coll-GFP cells (green), osteoclasts 
(white), and bone marrow vasculature (purple). White arrows, osteoclast 
clusters. n=3 mice. e, Azoomed-in region from d (box A), showing correlation 
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eK CK 
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CATK pixels/ Dye2 (%) 
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between 2.3Col1l-GFP cells and the remaining dye 1 (blue) ina D-type cavity, 
and abundant cathepsin K’ osteoclasts in the R-type region where dye 1 was 
eroded. f, Examples of an M-type region from d (box B). Inthis region, dye 1 was 
eroded tosome extent in spite of the presence of abundant 2.3Col1-GFP cells in 
the cavity. The corresponding cathepsin K panel shows the co-existence of 
several cathepsin K+ osteoclasts. g, Quantification of cathepsin K’ pixels in 
D-type (n=11 regions), M-type (n= 33 regions), and R-type (n=10 regions) 
cavities based on maximum intensity projection of montaged z-stacks. 
Compared toc, cathepsin K coverage shows a larger spread because it does not 
stain the cell body uniformly. Instead it frequently shows a punctate staining 
pattern, whichis likely to represent lysosomes and endosomes. *P< 0.0189; 
**P= 0.0015; ****P< 0.0001; two-sided Mann-Whitney test. Mean+s.d. 
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Extended Data Fig. 10 | Cell distribution in D-, M- and R-type cavities before 
and after Cy/GCSF treatment. n = 4 mice per group. Graphs show the fractions 
of MDS or MFG cells distributed in D-, M- and R-type cavities at the steady state 
and after Cy/GCSF treatment. The fraction is calculated by the total cells found 
in each cavity type divided by the total cells found in the calvaria of that mouse. 
a, The fractions of MFG cells increased in M-type cavities but decreased in 
D-type cavities after Cy/GCSF treatment. Mean+s.d. Non-treated groups: 


mps1¢FP 


e Non-treated 
= Cy/G-CSF 


D M R 


24.5+12.8,54.3+12.6 and 21.3 +15.6in D-, M- and R-type cavities, respectively. 
Treated groups: 0.5+1.0,96.0+4.7 and3.5+4.4in D-, M-and R-type cavities, 
respectively. **P=0.0096; *** P=0.0008.b, The fractions of cells decreased in 
D-type cavities but remained the same in M- and R-type cavities. Mean+s.d. 
Non-treated groups: 20.5+5.6,66.5+2.4 and 13.3 +3.6in D-, M- and R-type 
cavities, respectively. Treated groups: 6.8 +2.5,75.0+ 9.6 and18.8+8.9inD-, 
M- and R-type cavities, respectively. **P= 0.004; unpaired, two-tailed t-test. 
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Extended Data Fig. 11| Heterogeneous bone remodelling in bone marrow views of D-, M- and R-type cavities from tibia metaphysis. d-f, x-zcross-section 
cavities of tibia metaphysis. A mechanically thinned metaphysis was imaged views from annotated white lines in Supplementary Video 15 show bone 
from the bone surface, labelled by sequential calcium staining. a—c, En face marrow cavities of varied remodelling stages similar to mouse calvaria. 


Extended Data Table 1| Activated MFG-HSCs (Mds1°"Flt3°° 
mice) are characterized by increased motility and various 
cellular interactions between GFP cells 


Table of observed MFG-HSC behaviours in Cy/GCSF-treated mice; n= 4. 
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Extended Data Table 2 | Summary table of findings from live imaging of native HSCs versus transplanted HSCs 


Native HSCs (this work) Transplanted HSCs 


Adjacent to both endosteum and Adjacent to both endosteum and blood 
sinusoidal blood vessels vessels (did not identify type) (Lo Celso C 
et al, Nature 2009) 


Do not reside in regions with deepest Do not home in regions with deepest 
hypoxia hypoxia (Spencer JA et al, Nature 2014) 


Sessile; become motile after activation Sessile; become motile after activation 
(Rashidi NM et al, Blood 2014) 


Proliferate and form clusters after Proliferate and form clusters after 
Cy/GCSF or 5-FU transplantation (Lo Celso C et al, Nature 
2009) 
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Software and code 


Policy information about availability of computer code 


Data collection Data collection methodology for calvaria imaging and full bone imaging is described in detail in the methods section. For full-bone 
immunofluorescence confocal imaging was performed using the default acquisition software of the Leica TCS SP8 confocal microscope as 
previously described (Coutu D.L. et al, Nature Methods 2018). Calavaria imaging data collection was performed as described in detail in 
the methods section and as previously described in Spencer J.A. et al, Nature 2014. 
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Data analysis Data analysis methodology for calvaria imaging and full bone imaging is described in detail in the methods section. Custom code required 
for the analysis of full-bone immunofluorescence confocal imaging data was previously described (Coutu D.L. et al, Nature Methods 
2018) and is available for download at https://www.bsse.ethz.ch/csd/software/XiT.html . In addition, Graph Pad Prism (version 7) and 
Imaris (8.3.1 and (9.1.2) were used for the corresponding data analysis and insertion of random dots for statistical comparison. For 
calvaria imaging data analysis Olympus FluoView software, Matlab, ImageJ scripts and Graph Pad Prism (version 6 or higher) were used. 
Several built-in plugins from image J were used, including contrast enhancement, 3D image J suite, background subtraction, global/ local 
thresholds, and 3D Euclidean distance measurements. For the single cell RNA sequencing analysis the code is available upon request. For 
single cell fluidigm experiments, data analysis and hierarchical clustering was performed using MultiExperiment Viewer (MeV) program. 
For the single cell RNA sequencing experiment, the data were processed using a previously published workflow and code available on 
https://github.com/AllonKleinLab/SPRING. Any additional python scripts used for graph plotting of the RNA sequencing data is described 
in detail in the methods section. For the synthesis and characterization of the phosphorescent oxyphor probe, MALDI-TOF was used to 
confirm the identity of the intermediate products and of the target probe molecule. To estimate the HSC frequency (MFG cells) in bone 
marrow we used the extreme limiting dilution analysis software available on http://bioinf.wehi.edu.au/software/elda/. GraphPad Prism 
and Excel were used to analyze and display all mouse characterization related data. 


For manuscripts utilizing custom algorithms or software that are central to the research but not yet described in published literature, software must be made available to editors/reviewers 
upon request. We strongly encourage code deposition in a community repository (e.g. GitHub). See the Nature Research guidelines for submitting code & software for further information. 
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- A description of any restrictions on data availability 


All raw data from all in vivo experiments have been made available with the manuscript as source data in excel format or as supplementary information. The GEO 
accession code GSE115908 for the RNA seq data is publicly available. 
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Life sciences study design 


All studies must disclose on these points even when the disclosure is negative. 


Sample size No sample-size calculations were performed. The sample size was determined based on previous similar studies (Zhang Y et al, Blood 2011) 
and was adequate based on consistency of measured results in each group. 


Data exclusions Some data were excluded based on high variability and deviation from the mean in the % GFP positive cells present in Mds1-GFP/+ Flt3-Cre 
mice. The 5 highest and 5 lowest values were excluded to ensure proper representation of the calculated % GFP positive. Exclusion criteria 
were not pre-established. 


Replication Experimental findings were reliably reproduced. In rare cases in which large variability was observed it is indicated with corresponding SD. To 
verify reproducibility of the findings the vast majority of experiments were repeated at least three independent times. 


Randomization |Weaned mice from Mds1-GFP/+ x FIt3-Cre crosses, Mds1-CreER/+ x Rosa26-lox-stop-lox-Confetti crosses and Mds1-GFP/+ x C57/BL6 crosses 
were separated in male and female cages. Adult animals (2-6 months) of both sexes of corresponding genotypes were randomly selected and 
chosen for all experiments and used for BM isolation. 


Blinding No blinding was performed during data collection and analysis. For the study as we compare specific genotypes with or without treatment we 
have to pre-determine the genotype of each mouse followed by type of treatment performed and subsequent analysis. Thus, blinding during 
experiments and data acquisition is not possible in this study. 
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Materials & experimental systems Methods 
n/a | Involved in the study n/a | Involved in the study 
Unique biological materials ChIP-seq 
Antibodies Flow cytometry 
Eukaryotic cell lines MRI-based neuroimaging 
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Obtaining unique materials — All unique materials used are readily available from the authors. 


Antibodies 


Antibodies used All antibodies were purchased from Biolegend, BD Biosciences, eBiosciences, Invitrogen, Cedarlane, Aves and R&D. The 
corresponding manufacturers per antibody are indicated in methods. All antibodies were used at 1:100 concentration unless 
otherwise stated. 

Antibody Company Catalog number Concentration 
FIt3 PE-Cy5 eBioscience 15-1351-82 1:50 

c-kit APC eBioscience 17-1171-83 1:100 

c-kit AF700 eBioscience 56-1172-82 1:100 

Sca-1 PE-Cy7 Invitrogen 25-5981-82 1:100 

CD34 Biotin eBioscience 13-0341-85 1:33 

Fe R BV421 BioLegend 101331 1:50 

D150 PE-Cy5 BioLegend 115912 1:100 

D41 BV605 BioLegend 133921 1:100 

D48 APC-Cy7 BD Pharmingen 561242 1:100 

D45.2 V450 BD Horizon 560697 1:100 

220 APC eBioscience 17-0452-83 1:100 

220 Biotin eBioscience 13-0452-85 1:100 

D19 APC-Cy7 eBioscience 47-0193-82 1:100 

D19 Biotin Invitrogen 13-0193-85 1:100 

acl APC-Cy7 eBioscience 47-0112-82 1:100 

D4 APC Invitrogen 17-0041-83 1:100 

D4 Biotin eBioscience 13-0041-85 1:100 

D8a APC eBioscience 17-0081-83 1:100 

D8a Biotin Invitrogen 13-0081-85 1:100 

Ly-6G AF700 BD Pharmingen 561236 1:100 

IgM eFluor 450 eBioscience 48-5890-82 1:100 

Gr-1 APC eBioscience 17-5931-82 1:100 

Gr-1 Biotin eBioscience 13-5931-85 1:100 

Ter119 PE-Cy5 eBioscience 15-5921-83 1:100 

Ter119 Biotin Invitrogen 13-5921-85 1:100 
Streptavidin APC eBioscience 17-4317-82 1:100 
Streptavidin APC-Cy7 eBioscience 47-4317-82 1:100 
Streptavidin PE-Cy7 eBioscience 25-4317-82 1:100 
Ki67 PE-Cy7 Biolegend 652426 1:100 

anti-GFP Aves Labs GFP-1020 1:50 

anti-CD117 R&D systems AF1356 1:50 

anti-collagen type | Cedarlane CL50151AP 1:50 
anti-CD105 eBioscience 14-1051-82 1:50 

Alexa Fluor 488 streptavidin conjugate Thermo Fischer Scientific S32354 1:50 
Alexa Fluor 555 Thermo Fischer Scientific A21432 1:50 
Alexa Fluor 633 Biotium 20137 1:50 

Alexa Fluor 680 Thermo Fischer Scientific A10043 1:50 
donkey anti-chicken biotin Jackson Immunoresearch 703-065-155 1:50 
DAPI Thermo Fischer Scientific, D1306 1:2000 
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Validation All antibodies used are well characterized and validated by providers. For all flow cytometry antibodies validation was performed 
in the mouse system using isotype control Abs as well as specific cell types that are known to be negative or positive for the 
corresponding Ab by the manufacturer. For all flow cytometry used Abs, validation of expression and fluorescence was 
performed using flow cytometry analysis by the manufacturer. All validation information for each Ab as well as previous 
publications that have used each Ab can be found on the manufacturer's website. 


Animals and other organisms 
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Laboratory animals 


Wild animals 


Field-collected samples 


Flow Cytometry 


All animals used in this study for BM and imaging analysis are of Mus musculus species, C57/BL6 background strain independent 
of genotype and 2-6 months of age. Both males and females were used for all experiments. Mds1-GFP/+, Mds1-GFP/+ Flt3-Cre, 
Mds1-CreER/+ Rosa26-CAG-lox-stop-lox-Confetti/+, Rosa26-CAG-lox-stop-lox-tdTomato/+ were generated as detailed described 
in methods or purchased from JAX and crossed with our generated strains. A detailed description of the procedure followed to 
generate Mds1-GFP/+, Mds1-GFP/+ FIt3-Cre and Mds1-CreER mouse lines is included in the methods section. For Tamoxifen 
induction and Cyclophosphamide/GCSF experiments the procedure as well as doses are detailed described in the methods 
section. 


The study did not involve wild animals. 


The study did not involve field-collected samples. 
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Gating strategy 


The axis labels state the marker and fluorochrome used (e.g. CD4-FITC). 
The axis scales are clearly visible. Include numbers along axes only for bottom left plot of group (a 'group' is an analysis of identical markers). 
All plots are contour plots with outliers or pseudocolor plots. 


A numerical value for number of cells or percentage (with statistics) is provided. 


Bone marrow isolation was performed using crushing methodology, followed by red blood cell lysis. MACS beads and 
quadroMACSs (LS columns) were used for lineage depletion or c-kit enrichment. 40um filters were used to ensure single cell 
suspension prior FACS analysis. Antibody staining was performed for 45min, on ice, in PBS-2% FBS to ensure good staining and 
high viability of the cells. The detailed BM isolation protocol is included in the methods section. 


BD LSR II Flow cytometer was used for flow cytometry analysis, BD FACSAria II was used for cell sorting. 
BD FACSDIVA Software was used for data collection, FlowJo software (Tree Star) was used for data analysis. 


Cells were sorted with Purity modes at 80-85% efficiency. Post sort fractions analyzed were at least 95% pure. Sorted samples 
where double sorted to ensure purity of the sorted populations. In experiments in which low cell numbers of sorted cells was 
used, cells where secondary sorted directly in plates to ensure accuracy of cell number. 


FSC/SSC preliminary gating was used to exclude debris (lower FSC) and to restrict the main bone marrow population including 
larger cells such as granulocytes that are found in higher SSC levels. Back-gating was used to ensure that all excluded populations 
were debris or dead cells (positive for DAPI) and that all included populations were part of the various positive antibody 
fractions. FSC-H vs. FSC-A was used to exclude doublet cells. SSC-H vs, SSC-A was used as an additional secondary doublet 
exclusion step. Dead cells were identified using DAPI staining in all experiments. Gating for negative/positive populations for all 
antibodies was performed using a negative control (no stain) followed by single color positive control for each antibody. 


Tick this box to confirm that a figure exemplifying the gating strategy is provided in the Supplementary Information. 
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Neural control of the function of visceral organs is essential for homeostasis and 
health. Intestinal peristalsis is critical for digestive physiology and host defence, and 
is often dysregulated in gastrointestinal disorders’. Luminal factors, such as diet and 
microbiota, regulate neurogenic programs of gut motility” >, but the underlying 
molecular mechanisms remain unclear. Here we show that the transcription factor 
aryl hydrocarbon receptor (AHR) functions as a biosensor in intestinal neural circuits, 
linking their functional output to the microbial environment of the gut lumen. Using 
nuclear RNA sequencing of mouse enteric neurons that represent distinct intestinal 
segments and microbiota states, we demonstrate that the intrinsic neural networks of 
the colon exhibit unique transcriptional profiles that are controlled by the combined 


effects of host genetic programs and microbial colonization. Microbiota-induced 
expression of AHR in neurons of the distal gastrointestinal tract enables these 
neurons to respond to the luminal environment and to induce expression of neuron- 
specific effector mechanisms. Neuron-specific deletion of Ahr, or constitutive 
overexpression of its negative feedback regulator CYP1AI, results in reduced 
peristaltic activity of the colon, similar to that observed in microbiota-depleted mice. 
Finally, expression of Ahr in the enteric neurons of mice treated with antibiotics 
partially restores intestinal motility. Together, our experiments identify AHR 
signalling in enteric neurons as a regulatory node that integrates the luminal 
environment with the physiological output of intestinal neural circuits to maintain 
gut homeostasis and health. 


The enteric nervous system (ENS) encompasses the intrinsic neural 
networks of the gastrointestinal tract, which regulate most aspects 
of intestinal physiology (including peristalsis)°’. In addition to host- 
specific genetic programs, microbiota and diet have emerged as crit- 
ical regulators of the physiology of gut tissue”* and changes in the 
microbial composition of the lumen often accompany gastrointestinal 
disorders‘. Thus, depletion of the microbiota causes a reduced excit- 
ability of enteric neurons, changes in motility programs (such as the 
neurogenic colonic migrating motor complexes*”””) and prolonged 
intestinal transit time (ITT)"*. However, conventionalization of adult 
germ-free mice reduces the deficit in ITT" and restores neuronal excit- 
ability”, which suggests that intestinal neural circuits are endowed with 
molecular mechanisms that monitor the state of the gut lumen and 
adjust neuronal activity and motility accordingly. Despite considerable 
recent progress’ in describing the effects of the microbiota and diet on 
gastrointestinal physiology, the molecular mechanisms by which the 


luminal environment regulates ENS activity and intestinal peristalsis 
remain unknown. 

We hypothesized that molecular mechanisms that link the micro- 
biota to intestinal motor behaviour are likely to be encoded by genetic 
programs that operate predominantly in neural circuits of the colon, 
the intestinal segment with the heaviest load of microorganisms”. 
We therefore used RNA sequencing to identify genes that are specifi- 
cally upregulated in enteric neurons of the mouse colon in response 
to microbial colonization. Because our pilot experiments indicated 
that the current protocols for tissue dissociation and the recovery of 
intact ENS cells often resulted in considerable cellular damage and non- 
specific transcriptional changes, we developed a strategy that uses an 
adeno-associated virus (AAV) for labelling followed by the isolation and 
RNA sequencing of enteric neuron nuclei (nRNA-seq) that represent dif- 
ferent intestinal segments and microbiota states (Fig. la, Extended Data 
Fig. 1a-I). First, we compared the transcriptional profiles of myenteric 
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Fig. 1| Programming of the enteric neuron transcriptome by microbiota. 
a, Experimental design for AAV-mediated expression of nuclear-localized 
enhanced green fluorescent protein (eGFP) fused to the KASH nuclear 
membrane retention domain (eGFP-KASH)”° in myenteric neurons and RNA 
sequencing of neuronal nuclei purified by fluorescence-activated cell sorting 
(FACS). The plasmid used to generate the AAV9-CaMKII-eGFP-KASH vector is 
shownat the top. i.v., intravenous. b, c, Volcano plots showing mean log,- 
transformed fold change (x axis) and significance (-log,,(adjusted Pvalue)) of 
differentially expressed genes between myenteric neurons of the small 
intestine and the colon of SPF (b) and germ-free (GF) (c) mice. Differential gene- 
expression analysis was carried out using the DESeq2 R package. The default 
DESeq2 Wald test (two-sided) was used to determine P values for differential 
gene expression. Adjusted P values for differential gene expression were 
calculated with the DESeq2 default Benjamini-Hochberg method. log- 
transformed fold-change shrinkage was applied by having the betaPrior 


neurons from the colon and small intestine of conventionally raised 
specific-pathogen-free (SPF) mice. To minimize potential variation in 
gene expression due to diet or other environmental factors, we used 
mice from two independent animal facilities (the Francis Crick Institute 
and University of Bern) to produce two datasets (the Crick and Bern 
datasets, respectively). Transcriptional profiles of neuronal nuclei 
clustered exclusively according to the intestinal segment of origin 
(Extended Data Fig. 1m), indicating that neurons derived from the 
small intestine or colon express distinct transcriptional programs. 


log,(fold change GF versus SPF (colon)) 


parameter set to TRUE. Schematic of the small intestine and colon shown at the 
top. SPF CUEGs and germ-free CUEGs are within the area demarcated by red 
lines (log,-transformed fold change =1.5 <maximum; P< 0.05) inthe plots of 
bandc, respectively. n=4 SPF (Crick), 4 SPF (Bern) and 3 germ-free (Bern) 
independent nuclear isolates, each representing 3 mice. d, Volcano plot 
showing mean log,-transformed fold change (x axis) and significance 
(-log,)(adjusted Pvalue)) of differentially expressed genes between colonic 
myenteric neurons from SPF and germ-free mice (Bern). Schematics of the 
colon from germ-free and SPF mice are shownat the top. Microbiota- 
dependent CUEGs are within the area demarcated by red lines (log,- 
transformed fold change =1.5 < maximum; P<0.05).e, Representative images 
of myenteric ganglia (outlined by dotted line based on HuC/D immunostaining) 
from germ-free (left) and SPF (right) mice hybridized with RNAscope probes 
for Ahr, PrrS and Fam20c. Scale bars, 30 pm. Data represent two independent 
experiments. 


Differential gene-expression analysis carried out on the combined 
Crick and Bern datasets identified 254 genes that were upregulated 
in enteric neurons of the colon (termed colon-upregulated ENS genes 
(CUEGs)) of SPF mice (hereafter, SPF CUEGs) (Fig. 1b, Supplementary 
Table 1). Among the top differentially expressed SPF CUEGs were genes 
that are implicated in neuronal development and function, such as 
Pou3f3 (which encodes a transcription factor), AnoS (which encodes 
achloride channel), Pdelc (which encodes a regulator of intracellular 
second messengers), UncSd (which encodes a netrin receptor), Col25al 
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(which encodes neuron-specific collagen) and Pantr2 (which is a Pou3f3- 
adjacent non-coding transcript). To validate the nRNA-seq results, we 
used fluorescence RNA in situ hybridization (RNAscope) to analyse, 
at single-cell resolution, the relative expression of these genes in the 
myenteric plexus of the small intestine and colon. As expected, the 
hybridization signal was considerably stronger in enteric neurons of 
the colon relative to the small intestine (Extended Data Fig. 2a). These 
experiments demonstrate that the ostensibly homogeneous intrinsic 
neural networks along the mammalian intestine express segment- 
specific transcriptional programs. 

To assess the contribution of microbiota to shaping the transcrip- 
tional landscape of neural circuits along the gut, we also compared the 
nuclear transcriptomes of enteric neurons from the small intestine and 
colon of germ-free mice. Using the same statistical criteria, we identi- 
fied 122 CUEGs that are upregulated in colonic neurons of germ-free 
mice (hereafter, germ-free CUEGs) (Fig. 1c, Supplementary Table 2). 
The differential expression along the gut of many SPF CUEGs, including 
those we analysed by in situ hybridization, was maintained in germ-free 
mice (Extended Data Fig. 2b). These findings are consistent with parallel 
studies that demonstrated a similar expression of pan-neuronal and 
neuron-subtype markers in the myenteric plexus of SPF and germ-free 
mice (Extended Data Fig. 3a-f); this suggests that the transcriptional 
regionalization of the ENS along the mammalian gut is largely independ- 
ent of microbial colonization, and that the effects of microbiota on 
ENS physiology are mediated by asmall number of critical molecular 
pathways. To identify these pathways, we next compared directly the 
nuclear transcriptomes of colonic neurons from SPF and germ-free 
mice. We identified 25 genes (which we term microbiota-dependent 
CUEGs), the transcripts of which were more abundant in colonic neu- 
rons from SPF mice relative to germ-free mice (Fig. 1d, Supplementary 
Table 3). Several of the microbiota-dependent CUEGs (such as Fam20c) 
were absent from the list of SPF CUEGs, which suggests that they are 
likely to be expressed at comparable levels throughout the ENS but are 
under regulation by the microbiota specifically in colonic neurons. 
However, three fully annotated genes—Ahr, Dand5 and Prr5—were also 
present in the list of SPF CUEGs (Supplementary Table 1), indicating 
that these genes are upregulated specifically in colonic neurons in 
response to microbiota colonization. RNAscope analysis confirmed 
the higher expression of Ahr, PrrS and Fam20c in colonic neurons from 
SPF mice relative to germ-free mice (Fig. le). Together, these studies 
reveal a previously unappreciated complexity of gene expression in 
the mammalian ENS and demonstrate that the transcriptional land- 
scape of neural circuits along the gastrointestinal tract is shaped by the 
integrated effect of host-specific genetic programs and environmental 
factors suchas the microbiota. 

Next, we investigated the dataset of microbiota-dependent CUEGs as 
a potential source of regulatory and effector genes that link the micro- 
bial environment of the distal intestine with the functional output of 
colonic neural circuits. Initially, we focused on AAr because it encodes 
a transcription factor with activity that is regulated by a broad range 
of microbial, dietary and endogenous metabolites (AHR ligands), and 
which functions as a biosensor that is critical for intestinal epithelial- 
and immune-cell homeostasis”. Upon ligand binding, cytosolic AHR 
translocates to the nucleus and induces expression (among others) of 
genes that encode cytochrome P450 (CYP1) enzymes, which metabo- 
lize AHR ligands and thereby terminate AHR signalling”®. To provide 
evidence for a microbiota-AHR-neural-output axis in the gut, we immu- 
nostained the outer muscular layer (which includes the myenteric 
plexus) from the intestine of SPF and microbiota-manipulated mice 
for AHR. In the muscularis externa of the colon of SPF mice, AHR was 
expressed predominantly in myenteric neurons, which indicates that 
these neurons represent the main target of AHR ligands in this gut layer 
(Fig. 2a, Extended Data Fig. 4a—c). Almost all colonic myenteric neurons 
identified by the expression of the pan-neuronal markers HuC and HuD 
(designated hereafter as HuC/D) and subtype-specific (ChAT, nNOS, 
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calretinin, calbindin and NF-M) markers were positive for AHR (Fig. 2a, 
Extended Data Fig. 4d-g). Neurons exhibited either acytoplasmic ora 
nuclear signal (Fig. 2a), suggesting differential activation of AHR across 
the population of enteric neurons. In contrast to the colon, myenteric 
neurons inthe duodenum and the jejunum did not exhibit an AHR signal 
(Fig. 2b, Extended Data Fig. 4h), although a relatively weak signal was 
detected inthe neurons of the distal ileum (Extended Data Fig. 4i). The 
lower expression of Ahr in myenteric neurons of the small intestine in 
comparison to the colon was also confirmed by quantification of the 
RNAscope in situ hybridization signal for AHR transcripts (Extended 
Data Fig. 4a, j-l). Together, these findings suggest that the expression 
of Ahrin enteric neurons is commensurate to the microbial load along 
the gut. In support of the microbiota-dependent expression of Ahr, 
myenteric neurons inthe colon of germ-free mice and antibiotic-treated 
mice had reduced levels of AHR transcripts and aconsiderably weaker 
immunostaining signal, which was reinstated after colonization with 
the microbiota of SPF mice (Figs. le, 2c—-e, Extended Data Fig. 4m-r). 
On the basis of these experiments, we suggest that microbiota-induced 
expression of Ahr in colonic neurons would enable lumen- or tissue- 
derived ligands to activate AHR signalling in colonic neural circuits, and 
thus contribute to the molecular profile and functional specialization 
of these circuits. 

To identify potential target and effector genes of AHR signalling in 
the ENS, we next compared the nuclear transcriptomes of myenteric 
neurons from the colon of control mice and mice treated with the 
AHR ligand 3-methylcholanthrene (3MC)®. Among the 30 genes that 
showed the highest fold change with this treatment (which we term 
AHR-induced CUEGs) (Extended Data Fig. 5a) were Afrr and Cyp1al, 
which are known transcriptional targets of Ahr in several types of 
non-neuronal cells® and which have important roles in the feedback 
regulation of AHR signalling—either by repressing Ahr-dependent gene 
expression (AfArr) or by metabolizing AHR ligands (Cyp1a1)"°. AHR- 
dependent induction of Cyp1a1/ in enteric neurons was confirmed by 
quantification of the Cyplal-specific RNAscope signal in muscularis 
externa from the colon of control and 3MC-treated mice (Fig. 2f-h), 
and enhanced yellow fluorescent protein (eYFP) immunostaining 
of the myenteric plexus from 3MC-treated Cyplal:cre;Rosa26eYFP 
reporter mice”, in which activation of AHR signalling results in per- 
manent expression of eYFP (Extended Data Fig. 5b-d). Querying the 
list of AHR-induced CUEGs for potential regulators of neuronal func- 
tion downstream of the microbiota—AHR axis, we identified Kcnj12 
(Extended Data Fig. 5a), a gene that encodes the inwardly rectifying K* 
channel, subfamily J member 12 (Kir2.2) that regulates the excitability 
of cardiac muscle and neuronal cells”. Previous studies have detected 
an inwardly rectifying current in mammalian enteric neurons”, and 
the addition of a specific blocker (ML-133) of the current driven by 
the Kir2.x subfamily (Kir2.1, Kir2.2, Kir2.3 and Kir2.6)”' to live prepa- 
rations of myenteric plexus altered the electrically evoked firing of 
enteric neurons (Extended Data Fig. Se-g). Kcnj12 was found among 
the top 30 microbiota-dependent CUEGs after marginal relaxation of 
the P-value criteria (P< 0.06) (Supplementary Table 3), which raises the 
possibility that this gene is regulated by microbiota and AHR signalling. 
In support of this idea, RNAScope experiments showed covariance of 
Kenj12 and Cyp1a1 transcript levels in individual neurons in vivo after 
administration of 3MC (Fig. 2i-k) and—similar to Ahr—expression of 
Kcnj12 was reduced in myenteric neurons from germ-free mice and 
antibiotic-treated mice (Fig. 2I-n, Extended Data Fig. Sh-j). We also 
observed a correlation between Ahrand Kcnj12 expression in the enteric 
neurons of the colon of SPF mice under normal conditions (Extended 
Data Fig. 5k-m). To confirm that Ahr signalling regulates expression of 
Kcnj12 in colonic neurons, we next used AAV-mediated gene transfer 
(Fig. 1a) to generate mice with enteric-neuron-specific deletion of Ahr 
(termed Ahr’? mice). An AAV9 vector expressing Cre recombinase 
under the control of the neuronal CaMKII promoter (AAV9-CaMKII- 
Cre), was administered to conditional Afr mutant (AAr™)” and control 


b Duodenum 


AHR Merge 


AHR HuC/D 


Merge 


c PERI d PERI e PERI 
a oc 
= = 
< <x 
Q ran) 
ax SS 
1c) 3) 
=] 5 
x= exGF = 
g h i i k 
Control AHR agonist AHR agonist 
5 0-6) p - 0.0007 ; 
2 gales 
3 
5 0.4 
x 
oO 
So2 
& 
© o a 
& Cypta1 Kenj12 2-1 0 1 28 8 
eo log49(Cyp1a7) 
I - m o p EN-KO q 
== su als P=0.0058 
P<2.2x 10-6 500, —— 
_ 600) . _ _ 
g 8 400 8 
a 8 300 a a 
2 2200 a 
& 3 
5 100 ir 2 
iS SS eles - » 
Kenj12 Merge eYFP Kenj12 € & é € 
@ @ 


Fig. 2 | Microbiota- and ligand-dependent activation of AHR signalling. 

a, b, AHR (red) and HuC/D (blue) immunostaining of colon (a) and duodenum 
(b) muscularis externa (12-week-old SPF mice). Note neurons with cytoplasmic 
(arrowhead) or nuclear (arrow) signals. n=12 mice, 3 experiments. 

c-e, Immunostaining of SPF (c), germ-free (d) and conventionalized adult 
germ-free mice (exGF) (e) colonic myenteric ganglia with the pan-neuronal 
markers peripherin (PERI) (green), HuC/D (blue) and AHR (red). Small panels 
show AHR (top) or HuC/D (bottom). n=6 mice for each condition, 

2 experiments. f, g, Colonic myenteric ganglia from control (f) and 3MC- 
treated (g) mice hybridized with the Cyp1a1 probe (green). Dotted line, borders 
of myenteric ganglia; arrows, positive neurons. n=4 mice, 2 experiments. 

h, Quantitative PCR for Cypia1 transcripts (mean +s.d.) in colonic muscularis 
externa from control and 3MC-treated mice. n=6 control and 8 3MC-treated 
mice (two-sided non-parametric Mann-Whitney U-test). i,j, Colonic myenteric 


(Ahr‘“) mice carrying the lineage reporter Rosa26eYFP, which enabled 
us to monitor the efficiency and cell-type specificity of Ahr deletion 
by GFP immunostaining (Extended Data Fig. 5n, 0). Administration of 
AAV9-CaMKII-Cre to Ahr ;Rosa26eYFP mice resulted in eYFP expression 
and ablation of Ahrfrom the majority of enteric neurons (Extended Data 
Fig. So, p). eYFP-expressing neurons—which, as expected, lack an AHR 
RNAscope signal (Fig. 20, p)—showed significantly lower levels of Kcnj12 
transcripts relative to eYFP-negative neurons (Fig. 20-s). Together, 
our experiments indicate that microbiota-dependent Ahr induction 
in enteric neurons enables the cell-autonomous activation of genes 
that encode previously known feedback regulators of AHR signalling 
as well as newly identified regulators of enteric neuronal excitability. 
To determine whether AHR signalling in enteric neurons regulates 
intestinal physiology, we analysed the gut motility of Ahr"? mice. The 


ganglia from 3MC-treated mice hybridized with Cyp1a1 (green) (i) and Kcnj12 
(blue) (j) probes. Arrows, neurons co-expressing Cyplal and Kcnj12.n=4 mice, 
2 experiments. k, Positive correlation in CyplaJ and Kcnj12transcript level in 
myenteric neurons (F-test). n= 827 neurons, 4 mice. I, m, Colonic myenteric 
ganglia from SPF (1) and germ-free (m) mice hybridized with the Kcnj12 probe. 
Arrows, positive neurons. n=6 SPF mice and 6 germ-free mice, 2 experiments. 
n, Quantification of signal (mean +s.d.) inl and m (two-sided non-parametric 
Mann-Whitney U-test). n=510 neurons from SPF mice and 298 neurons from 
germ-free mice. o-q, Myenteric ganglia of Ah“ *° mice immunostained for 
eYFP (green) (0) and hybridized with the Ahr (red) (p) and Kcnj12 (blue) (q) 
probes. o, Merge of signalsin p and q.n=4 mice, 2 experiments. 

r,s, RNAscope signal intensity (mean + s.d.) for AAr (r) and Kcnj12 (s) in eYFP™ 
(AHR’*) and eYFP* (AHR ) neurons (two-sided non-parametric Mann-Whitney 
U-test).n=100 eYFP* and 223 eYFP neurons, 4 mice. Scale bars, 30 pm. 


identification of AHR-deficient neurons for at least four weeks after AAV 
administration—along with the similar number, morphology and neu- 
rochemical properties of colonic neurons in control and Ahr“? mice 
(Extended Data Fig. 6a)—indicated that Ahris dispensable for neuronal 
survival. Furthermore, no apparent deficit in the organization and cel- 
lular composition of the ENS was observed in constitutive Ahr mutant 
mice (Ahr” )* (Extended Data Fig. 6b). Despite the lack of discernible 
effects of Ahr deletion onthe cellular organization of the ENS, the total 
ITT of Ahr’? mice was increased relative to two sets of control mice: 
AAV-injected wild-type mice (WT + AAV, which act as controls for the 
potential effects of AAV on ITT) and Ahr“ mice (cohoused with Ahr’? 
mice to minimize potential effects of microbiota) (Fig. 3a). WT + AAV 
and Ahr mice were indistinguishable from one another in terms of 
colon histology and motility (Extended Data Fig. 7a, b), but the increase 
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Fig. 3| AHR signalling in enteric neurons regulates intestinal peristalsis. 

a, Quantification (mean+s.d.) of ITTin control and Ahr’? mice (two-sided 
non-parametric Mann-Whitney U-test). n=18 control and17 Ahr"*° mice. 

b, Quantification (mean +s.d.) of ITT in vehicle- and antibiotic-treated mice 
(two-sided non-parametric Mann-Whitney U-test). n=9 vehicle- and 

8 antibiotic-treated mice. c, Representative ex vivo recorded colonic migrating 
motor complexes from control (top) and AAr***° (bottom) mice. 

d, Quantification (mean +s.d.) of colonic migrating motor complexes 
(CMMCs) from controland Ahr"? mice (two-sided non-parametric Mann- 
Whitney U-test).n =6 controland5 Ahr’? mice. e, The negative feedback 
regulation of AHR signalling by CYP1AI (left) is the basis for the experimental 
design to assess the role of neuron-specific Cyplal overexpressionon 
intestinal motility (right). f, g, Myenteric ganglia from the colon of 
Rosa26'*"9P!! (f) and ENO” (g) mice hybridized with the Cyplal RNAscope 
probe (green). Dotted line defines the borders of myenteric ganglia and arrows 
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inITT after enteric-neuron-specific deletion of Ahr was comparable to 
that observed in microbiota-depleted SPF mice (Fig. 3a, b). To confirm 
that Ahractivity in enteric neurons regulates the physiological output 
of intestinal neural circuits independently of extrinsic gut innervation, 
we used ex vivo spatiotemporal mapping of colon preparations to 
record the ENS-dependent colonic migrating motor complexes”. The 
frequency and organization of colonic migrating motor complexes 
were reduced in Ahr“? mice (Fig. 3c, d), demonstrating that the cell- 
autonomous activity of Ahrin enteric neurons regulates the peristaltic 
activity of the colon. The CYP1A1-mediated clearance of natural AHR 
ligands”* predicts that constitutive upregulation of Cyp/a1 in enteric 
neurons would phenocopy the effect of neuron-specific deletion of 
Ahr on intestinal motility, thus demonstrating the ligand-dependent 
activity of AHR in enteric neurons. To test this idea, we administered 
the AAV9-CaMKII-Cre vector to mice homozygous for the Rosa26'! OP! 
allele’, resulting in constitutive overexpression of Cyp1al specifically in 
enteric neurons (termed EN®? mice) (Fig. 3e-g). As expected, ENO?! 
mice had an increased ITT (Fig. 3h) that is similar to that observed for 
Ahr“ mice (Fig. 3a), which indicates that dysregulation of AHR- 
ligand metabolism in enteric neurons disrupts intestinal motility. To 
examine further the potential role of AHR ligands in gut motility, we 
supplemented the diet of EN°”"” mice for four weeks with the AHR 
pro-ligand indole-3-carbinol (I3C), which generates the high-affinity 
ligand indolo[3,2-b]carbazole (ICZ)™. Exposure to I3C diet rescued—to 
alarge extent—the dysmotility in FN°”” mice (Fig. 3i), which further 
demonstrates that the neuron-specific and ligand-dependent activa- 
tion of AHR signalling regulates intestinal peristalsis. We suggest that, 
similar to ICZ, other natural ligands that originate in the gut lumen 
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indicate positive neurons. Data represent two independent experiments. Scale 
bars, 30 um.h, Quantification (mean +s.d.) of the effect of neuron-specific 
Cyp1a1 overexpression on total ITT (two-sided non-parametric Mann-Whitney 
U-test).n=17 control (WT + AAV and R26!°9"%) and 19 ENO? mice. i, I3C- 
supplemented diet rescues the total ITT increase observed in EN°”™ mice 
(two-sided non-parametric Mann-Whitney U-test) (mean+s.d.).n=Scontrol 
(purified diet), 4 EN°”” (purified diet), 9 control (I3C diet) and 8 (ENO? 
(13C-diet-fed) mice (female).j, Experimental design for expressing AHRin 
enteric neurons of microbiota-depleted mice. Wild-type SPF mice were 
injected with control (AAV-control) (top) or AHR-expressing (AAV-AHR) 
(bottom) AAV vectors and treated with antibiotics. All mice were fed with 
13C-supplemented diet one week before ITT analysis. k, Quantification 

(mean +s.d.) of the effect of combinations of AAV-control and AAV-AHR vectors 
with antibiotic treatment on total ITT (two-sided non-parametric 
Mann-Whitney U-test). n=10 mice per group. 


and activate AHR in epithelial and immune cells in the gut wall’*” are 
also capable of reaching nearby enteric neurons and their projections, 
modulating their transcriptional profile in an AHR-dependent manner. 

Finally, to provide direct evidence that AHR signalling is implicated 
inthe regulation of intestinal motility by microbiota, antibiotic-treated 
wild-type mice—which show reduced expression of Ahr in enteric neu- 
rons (Extended Data Fig. 4m-r) and alonger ITT (Fig. 3b)—were injected 
with AAV vectors expressing an AHR cassette under the control of the 
CaMKII promoter (AAV9-CaMKII-Ahr; AAV-AHR) or control vectors 
(AAV-control), and intestinal peristalsis was evaluated four weeks later. 
Because depletion of the microbiota is likely to reduce the amount of 
available AHR ligands”*”’, mice were also fed with 13C-supplemented 
diet for one week before the motility assay (Fig. 3j). As expected, anti- 
biotic treatment of mice injected with AAV-control showed a marked 
increase in ITT, but injection of AAV-AHR resulted in a significant reduc- 
tion of total transit time (Fig. 3k). However, the partial rescue that we 
observed suggested the presence of additional microbiota-dependent 
neuromodulators, suchas serotonin (which is produced by enterochro- 
maffin cells and modulates intestinal peristalsis)®. Together, our experi- 
ments demonstrate that AHR signalling in enteric neurons regulates 
the motor output of intestinal neural circuits. 

In this study, we reveal regulatory mechanisms of enteric neurons 
that link the luminal microenvironment of the gut with ENS function. 
The systematic comparison of neuronal transcriptomes that represent 
distinct intestinal segments and microbiota states of mice enabled us 
to identify the transcription factor AHR as fulcrum of an ENS-specific 
surveillance pathway that regulates intestinal peristalsis in response 
to microbial colonization. Furthermore, the identification of genes 


that regulate neuronal excitability (Kcnj12) as aclass of AHR signalling 
targets provides a plausible molecular link between the dynamic micro- 
environment of the gut lumen and the functional output of intestinal 
neural circuits. Although the full spectrum and detailed molecular 
mechanisms that act downstream of neuronal AHR remain to be charac- 
terized, our experiments suggest that components of this pathway are 
implicated in the pathogenesis of intestinal motility disorders. Arecent 
study has suggested a role for AHR in the biology of stool frequency, 
changes of which are one of the hallmarks of irritable bowel syndrome”. 
Therefore, pharmacological or dietary interventions that modulate 
AHR activity in the cellular circuitry that controls intestinal peristalsis 
offer a realistic strategy for the management of conditions associated 
with gut dysmotility. In addition to the role of AHRin neurogenic motil- 
ity, AHR-dependent transcriptional programs are also central to the 
barrier function of intestinal epithelial cells and the mucosal immune 
system’. We suggest that, by transmitting environmental triggers 
within diverse cell types, AHR integrates the activity of functionally 
distinct intestinal tissues towards gut homeostasis and host defence. 
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Methods 


No statistical methods were used to predetermine sample size. The 
experiments were not randomized and investigators were not blinded 
to allocation during experiments and outcome assessment. 


Mice 

All mouse procedures at the Francis Crick Institute were carried out in 
accordance with the regulatory standards of the UK Home Office and 
approved by the local Animal Welfare and Ethics Review Body (AWERB). 
Procedures at the University of Bern were performed in accordance 
with Swiss Federal Regulations. 

The following transgenic lines have previously been described: 
Cyplal::cre (ref.*"), Ahr’ (ref.”°), Ahr™ (ref. 72), Rosa26! 9?" (ref. 1°), 
Rosa26eYFP (ref. ”), Wntl::cre (ref. **) and Ai9S5D(RCL-GCaMP6f)-D 
(ref. **). Details about the generation of the ChAT-TVA-mCherry mice 
have not been published; information about this line is available from 
V.P. upon request. Transgenic and wild-type C57BL/6 mice were bred 
and maintained in the SPF facility of the Francis Crick Institute. For 
most experiments, mice were on standard Crick diet. For the intestinal 
motility rescue experiment, mice were fed with purified diet supple- 
mented with I3C (200 mg/kg) (Sniff Spezialdiaten), which generates 
metabolites representing physiological AHR ligands”. 

Wild-type and germ-free mice (C57BL/6) were bred and maintained 
in flexible-film isolators at the Clean Mouse Facility of the University 
of Bern (Switzerland). Germ-free status was monitored routinely by 
culture-based and other methods, and all mice were independently 
confirmed to be free of microorganisms*. For bacterial colonization 
experiments, faecal contents of SPF mice were orally administered 
to wild-type germ-free mice and colonized mice were cohoused and 
maintained in the SPF facility of the University of Bern for four weeks 
before the analysis. 

For depletion of the microbiota, wild-type (C57BL/6) mice were 
administered a 4-mM acetic acid solution containing 1 g/l ampicillin 
sodium, 0.5 g/l vancomycin hydrochloride, 1 g/l neomycin sulphate, 
1g/I metronidazole (all from Sigma-Aldrich) and 1% (v/w) artificial 
sweet flavour (Vimto) via drinking water for 18 days (Fig. 3k) or 30 days 
(Fig. 3b, Extended Data Figs. 4m-r, Sh-j). 

For systemic administration of AAV vectors”, 6-week-old mice were 
intravenously injected with AAV particles (dPCR GC titre > 1 10" GC) 
in 5% sucrose buffer (150 pl per mouse). 

For activation of enteric neuronal AHR signalling, 8-week-old mice 
were injected intraperitoneally with AHR agonist 3MC” (26.5 mg/kg 
mouse), which is metabolized less efficiently in comparison to I3C. 
After treatment for 14 h (Fig. 2f-k, Extended Data Fig. 5a) or 5 days 
(Extended Data Fig. Sb-d), colonic muscular layers were isolated for 
nuclear purification or immunohistochemistry. 

For our experiments we used male mice, unless specified otherwise 
in the Methods or relevant figure legends. All mice were between 8 
and 16 weeks old. 


Generation of AAV vectors 

For the generation of the AAV vector expressing eGFP fused to the 
KASH nuclear membrane retention domain (eGFP-KASH)*° under 
the control of the neuron-specific CaMKII promoter (AAV-CaMKII- 
eGFP-KASH), the DNA cassette encoding KASH-tagged eGFP (eGFP- 
KASH)*° was amplified from the PX552 plasmid (Addgene, 60958) by 
Phusion High-Fidelity DNA polymerase. PCR was performed using the 
5’-GCTATCGGATCCGCCACCATGGTG-3’ and 5’-GCTATCGAAT TCCTAGG 
TGGGAGG-3’ primers, which enabled us to use the BamHI and EcoRI 
restriction sites to substitute eGFP—KASH for eGFP in plasmid pAAV- 
CaMKIlI-eGFP (Addgene, 50469), thus generating pAAV-CaMKII-eGFP- 
KASH (Fig. 1a). NEB stable competent Escherichia coli (New England 
Biolabs, C30401) was transformed with pAAV-CaMKII-eGFP-KASH, 
and bacteria were grown on LB agar plates containing ampicillin 


(50 mg/ml). The presence of inverted terminal repeats was confirmed 
by Smal digestion. pAAV-CaMKII- eGFP-KASH has been deposited to 
Addgene (124882). Large-scale packaging of AAV-CaMKII-eGFP-KASH, 
AAV-CaMKII-Cre (Addgene, 105558) and AAV-CaMKII-eGFP (Addgene, 
105541) vectors into serotype 9 capsid (AAV9) was carried out by Penn 
Vector Core. AAV9-CaMKII-AHR was generated by Vector Biolabs. 


Histopathological analysis 

Intestinal segments were collected from mice infected with AAV9 vec- 
tors, fixed with 4% paraformaldehyde (PFA) in PBS overnight at 4 °C 
and embedded in paraffin. Paraffin was removed from sections with 
xylene, rehydrated with ethanol and stained with either haematoxylin 
and eosin or Alcian blue-PAS haematoxylin. 


Immunohistochemistry 

For immunostaining, the intestine was flushed free of luminal contents, 
cut along the mesenteric border following removal of adipose tissues 
on serosa and fixed with 4% PFA overnight at 4 °C. For the preparation 
of the submucosal plexus layer, a 1-ml pipette was inserted into the 
lumen to fully extend the smooth muscle layer containing myenteric 
plexus, which was removed from the mucosal compartment using 
cotton buds as previously described**. The submucosal plexus layer 
was treated with 30 mM EDTA in PBS for 30 min on ice to remove the 
epithelial layer, stretched on Sylgard coated Petri dish and fixed with 
4% PFA for 3 hat 4 °C. The fixed tissues were then rinsed with PBS for 
three times at room temperature. Gut tissues were then permeabilized 
and preblocked with 10% normal donkey serum (NDS) and 1% Triton 
X-100 in PBS for 1h at room temperature and incubated with primary 
antibodies (listed in the Reporting Summary) in the same buffer for 
48 h (at 4 °C). Tissue was then incubated with secondary antibodies 
(listed in the Reporting Summary) in 10% NDS and 1% Triton X-100 for 
12 hat room temperature. DAPI (Molecular Probes, d3571) was used 
to counterstain the nucleus. Samples were washed with PBS before 
mounting with VECTASHIELD (Vector Laboratories). 


Image processing 

Immunostained gut preparations were examined with Olympus 
FV3000-Invert (SW312-CB1) confocal laser scanning microscope and 
FV31S-SW software (Olympus) using standard excitation and emis- 
sion filters for visualizing DAPI, Alexa Fluor 488, Alexa Fluor 568 and 
Alexa Fluor 647. Allimages were processed with Adobe Photoshop CS 
8.0 (Adobe Systems) while analyses were performed using the image- 
processing package Fiji and ImageJ (W. Rasband, NIH). 


Fluorescence in situ hybridization 

Fluorescence in situ hybridization on the myenteric plexus was car- 
ried out using the Advanced Cell Diagnostics RNAscope Fluorescent 
Multiplex Kit (ACD, 320850) according to manufacturer’s specifica- 
tion. In brief, the muscularis externa layer of the gut was dehydrated 
by serial ethanol treatments and treated with RNAscope Protease III 
for 30 min (for small intestine) or 45 min (for colon) at room tempera- 
ture. Tissue was then incubated overnight (at 40 °C) with fluorescent 
probes, 3-plex positive control probe, 3-plex negative control probe or 
customized probes. Following hybridization, tissue was washed twice 
with wash buffer and then subjected to sequential hybridization with 
pre-amplifier, amplifier DNA (amp1-FL, amp 2-FL and amp 3-FL) and 
fluorophore (amp 4 alt A-FL) at 40 °C for 15-30 min for each step. After 
hybridization, tissues were counterstained for the pan-neuronal marker 
HuC/D) and mounted on Superfrost Plus Adhesion Microscope Slides 
(ThermoFisher Scientific, 10149870) using VectaMount Permanent 
Mounting Medium (ACD, 321584). RNAscope probes (Advanced Cell 
Diagnostics) used in this study include Ret-Cl (431791), Ahr-C1 (452091), 
Cyplal-Cl1 (464611), Pou3f3-Cl1 (441521), Pdelc-Cl1 (489011), AnoS-C1 
(557141), Pantr2-Cl (483721), Prr5-C1 (557121), Fam20c-Cl (453351), 
UncSd-C2 (480461), Col25al1-C3 (538511) and Kcenj12-C3 (525171). 


Quantification of RNAscope intensity on enteric neurons 
Fluorescent signals of RNAscope were colocalized with HuC/D* enteric 
neurons using an automated pipeline in CellProfiler®. In brief, the Alexa 
Fluor 405 channel (HuC/D) was background-corrected using a baseline 
subtraction of 10% of the maximum pixel intensity. The background- 
subtracted image was then segmented and individual neurons were 
detected using the IdentifyPrimaryObjects module and clumped 
objects were separated by shape. RNAscope spots (2-30 pixels) were 
identified separately for each gene and were processed further only 
if colocalized to HuC/D* neurons. Individual neurons were tracked 
through the z-stack by measured overlap. The RNAscope signal inten- 
sity of individual neurons was integrated through the image z-stack in 
slices in which the cells were identified. 


Quantitative PCR 

Total RNA was isolated from the colonic muscular layer using Trizol LS 
reagent and the PureLink RNA Micro Kit (Invitrogen, 12183016) accord- 
ing tothe manufacturer’s specifications, and was subjected to reverse 
transcription using the High-Capacity cDNA Reverse Transcription Kit 
(Applied Biosystems, 4368814). Quantitative PCR was performed with 
complementary DNA (cDNA) using Taqman fast universal 2x PCR Mas- 
ter Mix (Applied Biosystems) and Taqman probes (Applied Biosystems) 
for Actb (Mm02619580 g1) and Cyplal (Mm00487218 m1). C, values 
obtained were normalized to Actb. 


Purification of neuronal nuclei from the myenteric plexus 
For the isolation of neuronal nuclei from the myenteric plexus, mice 
were injected intravenously with AAV9-CaMKII-eGFP-KASH. Five weeks 
after intravenous injection of the AAV, the longitudinal smooth muscle 
layer and associated myenteric plexus were peeled off the wall of the 
small intestine and colon and subjected to Dounce homogenization 
in lysis buffer (250 mM sucrose, 25 mM KCI,5 mM MgCl,, 10 mM Tris 
buffer with pH 8.0, 1mM DTT) containing 0.1% Triton X-100, cOmplete 
EDTA-free protease inhibitor (Sigma-Aldrich) and DAPI. After filtering 
the homogenate to remove large debris, samples were centrifuged 
at 1,000g for 10 min at 4 °C to obtain a pellet containing muscularis 
externa nuclei. For flow cytometric analysis, doublet discrimination 
gating was applied to exclude aggregated nuclei, and intact nuclei were 
determined by subsequent gating on the area and height of DAPI inten- 
sity. Both eGFP* and eGFP’ nuclear populations were collected directly 
into a1.5-ml tube containing Trizol LS reagent (Invitrogen) using, at the 
Francis Crick Institute, the Aria Fusion cell sorter (BD Biosciences) and, 
at the University of Bern, the Aria III (BD Biosciences). The obtained 
FCS data were further analysed using FlowJo software version 10.5.3. 
The list of SPF CUEGs was generated by combining nRNA-seq data 
from mice housed in both facilities. Labelling and isolation of neuronal 
nuclei in the two facilities was carried out using exactly the same pro- 
tocol and reagents, except for the use of two different FACS sorters 
(Aria Fusion at the Francis Crick Institute and Aria III in Bern). RNA 
extraction and RNA sequencing for all samples was done at the Francis 
Crick Institute. The list of germ-free CUEGs was generated with mice 
sourced exclusively from the Bern germ-free facility. The identification 
of microbiota-dependent CUEGs was done by comparing the transcrip- 
tome of colonic neurons from SPF and germ-free mice from the Bern 
facility. Finally, the transcriptomic experiment that generated the list 
of AHR-induced CUEGs was carried out using exclusively Crick mice. 


RNA sequencing and bioinformatic analysis 

Extraction of nuclear RNA and nuclear RNA sequencing were carried out 
at the Francis Crick Institute. Nuclear RNA was isolated using PureLink 
RNA Micro Kit (Invitrogen, 12183016) according to the manufacturer's 
instructions. Double-stranded full-length cDNA was generated using 
the Ovation RNA-Seq System V2 (NuGen Technologies). Following 
quantification ona Qubit 3.0 fluorometer (Thermo Fisher Scientific), 


cDNA was fragmented to 200 bp by acoustic shearing using Covaris 
E220 instrument (Covaris) at standard settings. The fragmented cDNA 
was then normalized to 100 ng, which was used for sequencing library 
preparation using the Ovation Ultralow System V2 1-96 protocol (NuGen 
Technologies). A total of 7 PCR cycles was used for library amplifica- 
tion. The quality and quantity of the final libraries were assessed with 
TapeStation D1000 Assay (Agilent Technologies). The libraries were 
then normalized to 2.5 nM, pooled and loaded onto a HiSeq 4000 
(Illumina) to generate 75-bp single-end reads. For the transcriptomic 
comparison of colon to small-intestine myenteric neurons from SPF 
mice (Fig. 1b) we used eight nuclear isolates (four from Crick, and four 
from Bern, mice), each representing three mice. All mice in the Crick 
samples were male. Two of the Bern samples were generated from male 
and two from female mice. Male and female samples were indistinguish- 
able by principal component analysis. The transcriptomic comparison 
of colon to small-intestine myenteric neurons from germ-free mice 
(Fig. Ic) were generated using three nuclear isolates (all Bern mice), 
each representing three mice. 

For the bioinformatics analysis, the ‘Trim Galore!’ utility version 
0.4.2 was used to remove sequencing adaptors and to quality trim 
individual reads with the g-parameter set to 20. The sequencing reads 
were then aligned to the mouse genome and transcriptome (Ensembl 
GRCm38 release-86) using RSEM version 1.3.0 in conjunction with the 
STAR aligner version 2.5.2. Sequencing quality of individual samples 
was assessed using FASTQC version 0.11.5 and RNA-SeQC version 1.1.8. 
Differential gene expression was determined using the R Bioconductor 
package DESeq2 version 1.14.1. 


ITT assay 

The total ITT was measured as previously described**. Mice were placed 
individually in bedding-free cages with the diet and HydroGel (Clear 
H,O). For all experiments, 250 pl of 6% (w/v) carmine red dye (Sigma- 
Aldrich) in 0.5% (w/v) methylcellulose (Sigma-Aldrich) was orally admin- 
istered to each mouse at 09:00. The time period from gavage until 
the emergence of the first red-colour pellet was recorded as total ITT. 


Live video imaging and spatiotemporal mapping of colonic 
motility 

Ex vivo video imaging and analysis of colonic motility was performed as 
previously described**. In brief, the entire colon was carefully isolated 
and loosely pinned in an organ bath chamber continuously superfused 
(flow rate of 4 ml/min) with oxygenated (95% O, and 5% CO.) Krebs 
solution (in mM, 120.9 NaCl, 5.9 KCI, 1.2 MgCL,, 2.5 CaCl,, 1.2 NaH,PO,, 
14.4 NaHCO,, 11.5 glucose) kept at 37 °C. Following equilibration of 
the colon for 30 min, movies of colonic motility were captured (2.5-Hz 
frame rate) with a QICAM-Fast camera using QCapture Pro 6.0 software 
(Q-Imaging). Images were read into an Igor Pro (WaveMetrics) and ana- 
lysed using custom-written algorithms. The edges of the bowel were 
determined, and the width computed and mapped over time. From 
the generated spatiotemporal maps, the frequency of propagating 
contractions was determined. 


Ca” imaging of colonic myenteric plexus 

For ex vivo Ca** imaging experiments, the large intestine of adult 
Wnt1::cre;Rosa26-GCaMP6f mice*** was isolated and pinned flat ina 
Sylgard-lined dish filled with Krebs solution, bubbled with 95% O, and 5% 
CO, at room temperature. The mucosal, submucosal and longitudinal 
muscle layers were carefully removed to obtain a circular muscle with 
adherent myenteric-plexus preparation, which was mounted over a 
small inox ring, immobilized by a matched rubber O-ring*°. Myenteric- 
plexus preparations were placed in a recording chamber mounted 
onan upright Zeiss Axio Examiner.Z1 microscope equipped witha 
Poly V xenon monochromator (TILL Photonics) and water dipping 
lens (20x, NA 1.0, Zeiss). GCaMP6f was excited at 475 nm and images 
were recorded at 525/50 nm (at 2 Hz) on a Sensicam-QE CCD camera 
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(PCO) using TillVisION (TILL Photonics). Myenteric plexus prepara- 
tions were constantly superfused with carbogenated Krebs solution 
at room temperature containing 1 uM nifedipine (Sigma-Aldrich) viaa 
local gravity-fed (+1 ml/min) perfusion pipette. Myenteric ganglia were 
stimulated electrically using single and a train (20 Hz, 2s) of pulses 
(300 ps) transmitted froma Grass stimulation unit via a focal electrode 
(50-pum diameter tungsten wire) placed onan interganglionic connec- 
tive leading to the selected myenteric ganglia within the field of view. 

Analysis was performed with custom-written routines in Igor Pro 
(Wavemetrics)*°. Regions of interest were drawn, after which the 
average Ca” signal intensity was calculated, normalized to the initial 
GCaMPé6f values and reported as F/Fo. Cells were considered as respond- 
ers when the GCaMPéf signal rose above baseline plus 3x the intrinsic 
noise (standard deviation) during the recording. The Ca” transient 
amplitudes were measured as the maximum increase in [Ca”*], above 
baseline (maximum F,/F,). 


Statistical analysis 

Statistical comparisons between samples were performed in GraphPad 
Prism software using Student’s t-test. When variances were not homo- 
geneous, the data were analysed by the non-parametric Mann-Whitney 
U-test. For correlation analysis of RNAscope intensity (Fig. 2k, Extended 
Data Fig. 5m), data were filtered to remove pairs for which either obser- 
vation was zero and data were then log,,-transformed. The correlation 
coefficient was calculated by fitting a standard linear regression model 
to this log,).-transformed data and quality of the overall model was 
assessed with an F-test. Data excluded from the analysis were overlaid 
on the scatter plot by log,,.-transforming the non-zero observation 
and retaining the zero value. The analysis was performed using R 3.3.1. 


Reporting summary 
Further information on research design is available in the Nature 
Research Reporting Summary linked to this paper. 


Data availability 


All RNA-seq data are available at Gene Expression Omnibus (GEO) under 
accession number GSE14.0293. Source Data for Figs. 2,3 and Extended 
Data Fig. 1-5, 7 are provided with the paper. All datasets analysed 
during the current study are presented in this manuscript, or are 
available from the corresponding authors upon reasonable request. 


Code availability 

The source code and installation instructions for colonic migrating 
motor complex evaluation and Ca”* imaging can be found at https:// 
doi.org/10.7554/eLife.42914.039 (Ca”* imaging analysis source code) 
and https://doi.org/10.7554/eLife.42914.040 (installation instruc- 
tions and user guide). For more information, please contact pieter. 
vandenberghe@kuleuven.be. The code related to the RNAscope signal 


quantification is available at GitHub (https://github.com/FrancisCrick- 
Institute/Pachnis-lab/tree/master/Neuronal-programming-Nature). 
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Extended Data Fig. 1| AAV-based transcriptional profiling of enteric 
neurons. a-g, Representative images of myenteric ganglia from mice injected 
with the AAV9-CaMKII- eGFP-KASH vector. Colon (a) and small intestine (b-g) 
myenteric-plexus preparations were immunostained with antibodies against 
eGFP (a-g), PGP9.5 (a, b), SLOOB (c), SOX10 (d), nNOS and calretinin (CALR) (e), 
nNOS and calbindin (CALB) (f), and VIP and ChAT (g). Data represent two 
independent experiments. Scale bars, 100 pm (a, b) and 30 pm (c-g). 

h, Percentage of PGP9.5* enteric neurons (mean +s.d.) in the proximal small 
intestine and colon expressing eGFP, following intravenous administration of 
the AAV9-CaMKII-eGFP-KASH vector. n=1,308 colon neurons and 784 small- 
intestine neurons from 3 mice. i, FACS plots indicating the gating parameters 
for the isolation of muscularis externa nuclei (gated on DAPI) from the colon 
(left) and small intestine (right) of mice injected intravenously with AAV9- 
CaMKII-eGFP-KASH.j, k, Peripherin (red) and eGFP (green) whole-mount 
immunostaining of the colon (j) and small intestine (k) of mice injected with 


AAV9-CaMKII-eGFP-KASH mice following dissection of the muscularis externa. 


eo 


; - : - 1 

PC1 
The identification of an intact submucosal plexus demonstrates that our 
transcriptomic analysis is specific for myenteric neurons. Images represent 
two independent experiments. Scale bars, 100 pm. I, Volcano plots showing 
mean log,-transformed fold change (x axis) and significance (-log,)(adjusted 
Pvalue)) of differentially expressed genes between eGFP* and eGFP’ nuclei 
isolated from the colon (left) and small intestine (right) of mice injected with 
AAV9-CaMKII-eGFP-KASH vector. Coloured dots indicate genes specific to 
enteric neurons (Ret, Chat, Camk2a, Elavl3, Elavl4, Nos1 and Tubb3) in red, glial 
cells (Sox10, Gfap, Cdh19, Entpd2, $100b and Plp1)* in blue and muscular 
macrophages (/tgam, Cd163, H2-Ab1, Mrcl and Retnla)* in green.n=4 mice 
(Crick). m, Principal component analysis of the transcriptomes of eGFP* 
(neuronal) and eGFP’ (non-neuronal) nuclei isolated from the muscularis 
externa of the colonand small intestine of mice injected with AAV9-CaMKII- 
eGFP-KASH vectors. Segregation of nuclear transcriptomes according to their 
neuronal versus non-neuronal origin and anatomical location along the gut. 
n=4 mice (Crick). 
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Extended Data Fig. 2| Differential expression of enteric-neuron-specific 
genes in myenteric neurons from the small intestine and colon of SPF and 
germ-free mice. a, b, Representative images of myenteric ganglia (outlined by 
dotted line) from small intestine (left) and colon (right) of SPF (a) and germ-free 
(b) mice hybridized with the indicated fluorescence RNAscope probes and 
counterstained for the pan-neuronal marker HuC/D. Ret (positive control for 
RNAscope detection) is expressed in neurons of myenteric ganglia of both the 
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small intestine and the colon. Pou3f3, Pdelc, Pantr2, AnoS, UncSdand Col2S5al 
are expressed at higher levels in colonic versus small intestine neurons in both 
SPF (a) and germ-free (b) mice. Data represent three independent experiments. 
Transcripts per kilobase million (TPM) values (mean +s.d.) for each transcript 
in small intestine and colon neurons from SPF (a) and germ-free (b) mice. 
n=8SPF and3 germ-free mice. Scale bars, 30 um. 
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Extended Data Fig. 3 | Molecular and neurochemical characterization of 
colonic neurons in germ-free mice. a, TPM values (mean +s.d.) for neuronal 
gene markers Elavl4, Uchl1, Prph, Chat, Vip, Nos1, Calb2 and Nefm inthe 
muscularis externa of the colon of SPF and germ-free mice. n=4 SPF and 

3 germ-free mice. b-f, Immunostaining of colonic myenteric ganglia from 


SPF 


SPF 


© 


Q 
a 


PGP9.5 ChAT TuJ1 


PERI NF-M 


h 


Q 
n 


germ-free (top) and SPF (bottom) mice with VIP, CALR and HuC/D (b), CALR, 
nNOS and TuJ1 (c), VIP, nNOS and CALB (d), PGP9.5, ChAT and TujJ1 (e) and 
peripherin and NF-M (f). Scale bars, 30 pm. Data represent three independent 
experiments. 
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Extended Data Fig. 4 | See next page for caption. 
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Extended Data Fig. 4 | Microbiota-dependent expression of AHR in colonic 
neurons. a, TPM values (mean +s.d.) for AHR transcripts in neuronal and non- 
neuronal nuclear preparations from muscularis externa from colon and small 
intestine of SPF and germ-free mice. n=4 (SPF) and 3 (germ-free) mice. 

b,c, Myenteric ganglia immunostained for KIT (which identifies interstitial 
cells of Cajal) and AHR (b) or SOX10 (enteric glial cells) and AHR (c). AHR‘ cells 
are distinct from intestitial cells of Cajal and enteric glia. Scale bars, 30 pm. 
d-f, Immunostaining of neurons from the colon of wild-type mice for AHR (d-f) 
and the neuronal markers peripherin and NF-M (d), calbindin and nNOS (e), and 
calretinin and HuC/D (f). AHR signal was detected in all subtypes of myenteric 
neurons (arrowheads). Scale bars, 30 um. g, Immunostaining of neurons from 
the colon of ChAT-mCherry-TVA reporter mice for mCherry (red), AHR (green) 
and HuC/D (blue). Arrowhead indicates an enteric neuron positive for ChAT 
and AHR. Scale bar, 30 pm. h, i, Immunostaining of myenteric ganglia from the 
jejunum (h) and ileum (i) with the pan-neuronal marker peripherin (blue) and 
AHR (red). Scale bar, 30 pm.j, k, Representative images of enteric ganglia from 


duodenum (j) and colon (k) hybridized with RNAscope probe for Ahr (green). 
Dotted line defines the borders of myenteric ganglia. Scale bar, 30 pm. 

I, Quantification (mean + s.d.) of RNAscope signal per neuron is shown (two- 
sided non-parametric Mann-Whitney U-test). n=91small-intestine and 

254 colon neurons from 6 mice. m-o, Immunostaining of ganglia from the 
colon of control (m), antibiotic-treated (n) and microbiota-colonized, 
antibiotic-treated (0) mice with peripherin (blue) and AHR (red). Small panels 
show signal for AHR (top) and peripherin (bottom). n=3 mice for each 
condition. Scale bars, 30 um. p, q, Representative images of enteric ganglia 
fromthe colon of control (p) and antibiotic-treated (q) mice hybridized with 
RNAscope probe for Ahr (green). Dotted line defines the borders of myenteric 
ganglia and arrows indicate positive cells. Scale bars, 30 pm. r, Quantification 
(mean +s.d.) of RNAscope signal per neuron is also shown (two-sided non- 
parametric Mann-Whitney U-test). n=518 neurons from 4 control and 

468 neurons from 4 antibiotic-treated mice. Data represent two (j,k, m-q) or 
three (b-i) independent experiments. 
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Extended Data Fig. 5 | See next page for caption. 


Extended Data Fig. 5 | AHR-dependent gene expression and effects oncolon 
myenteric neurons. a, The top 30 genes upregulated in colonic neurons by 
AHR-ligand treatment (AHR-induced CUEGs) were identified on the basis of 
fold-change criteria (log-transformed fold change =2 < maximum). 

b, Cypla1::cre;Rosa26eYFP reporter mice were intraperitonially injected with 
3MC five days before GFP immunostaining. CYP1Alinductionin response to 
ligand-activated AHR signalling is expected to induce expression of eYFP. 

c,d, Immunostaining of myenteric ganglia from the colon (c) and small 
intestine (d) of 3MC-treated Cyplal::cre;Rosa26 mice for peripherin (red), 
HuC/D (blue) and eGFP (green). Scale bars, 100 pm. Data represent three 
independent experiments. e-g, Live calcium imaging of colonic myenteric 
plexus preparations from Wnt1::cre;Rosa26-GCaMPé6f mice. Electrically 
stimulated Ca?‘ transients in enteric neurons under control conditions (e) or in 
the presence of the ML-133 blocker”! (10 1M) (f). Data represent four 
independent experiments. The greyscale images depict a proximal colon 
myenteric plexus preparation in which enteric neurons were stimulated bya 
single electrical pulse (top panels) or an electrical pulse train (1s, 20 Hz; bottom 
panels) via a focal electrode positioned onan internodal strand leading into the 
myenteric ganglion in the field of view. Left, baseline before stimulation. 
Middle, peak GCaMPé6f fluorescence of the same ganglion upon electrical 
stimulation. Scale bars, 20 pm. Right, Ca”* transients of individual enteric 
neurons (indicated by colour-coded arrows shown in the middle panels) 
induced by electrical stimulation. The electrical stimulus was applied at 10s as 
marked by the black arrows. Comparison of the average maximal GCaMP6f 
fluorescence amplitudes of neuronal Ca** responses (mean +s.e.m.) under 
control conditions (e) or the presence of ML-133 (f) upon single pulse (top) 


(n=457 neurons) and pulse train (bottom) (n=526 neurons) electrical 
stimulation is shown in g (two-sided paired ¢-test). h, i, Myenteric ganglia from 
colon of control (h) and antibiotic-treated (i) mice hybridized with the Kcnj12 
RNAscope probe. Dotted line defines the borders of myenteric ganglia and 
arrows indicate Kcnj12-expressing cells. Scale bars, 30 pm. Data represent two 
independent experiments.j, Quantification of RNAscope signal (mean +s.e.m.) 
shown inh andi (two-sided non-parametric Mann-Whitney U-test). 
n=421neurons from 4 control and 468 neurons from 4 antibiotic-treated mice. 
Abx, antibiotics. k,l, Myenteric ganglia from colon of SPF mice hybridized with 
the Ahr (green) (k)and Kcnj12 (blue) (I) RNAscope probes and immunostained 
with HuC/D (data not shown). Dotted line defines the borders of myenteric 
ganglia and arrows indicate AHR- and KCNJ12-expressing neurons. Scale bars, 
30 pm. Data represent two independent experiments. m, Scatter plot shows 
positive correlation in RNAscope signal for Afr (k) and Kcnj12 (I) in myenteric 
neurons (F-test). 2=1,037 neurons from 3 mice.n, 0, Immunostaining of 
myenteric ganglia from control (Ahr“;Rosa26eYFP injected with the AAV9- 
CaMKII-Cre vector) (n) and Ahr“? (o) mice for AHR (red) and eYFP (green). 
Note the lack of overlap between green and red signal in the case of Ahr’? (0). 
Data are representative of two independent experiments. Scale bars, 30 um. 

p, Percentage of AHR‘ neurons in myenteric ganglia of control 
(Ahr'“*;Rosa26eYFP mice injected with the AAV9-CaMKII-Cre vector) and 
Ahr'**° mice. Random images were acquired from the colon of each biological 
replicate (n=9 for control, n=13 for Ahr" *°), and the average percentage 
(mean +s.d.) of AHR* HuC/D* cells among the total population of HuC/D* 
neurons was calculated (two-sided Student’s t-test). 
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Extended Data Fig. 6 | Deletion of Ahrdoes not alter the organization and 
composition of myenteric ganglia. a, Immunostaining of muscularis externa 
preparations from the colon of control (top) and Ahr™*° (bottom) mice with 
nNOS, eYFP and HuC/D (left) or peripherin, eYFP and VIP (right). Scale bars, 
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30 pm. b, Immunostaining of muscularis externa preparations fromthe colon 
of wild-type (top) and Ahr” (bottom) mice with PGP9.5 and HuC/D (left), VIP, 
peripherin and HuC/D (middle) and PGP9.5 and nNOS (right). Scale bars, 

100 pm. Data represent three independent experiments. 
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Extended Data Fig. 7 | Intravenous administration of AAV vectors does not experiments. b, Graph (mean +s.d.) shows that administration of AAV-CaMKII- 
elicit an inflammatory response or intestinal dysmotility. a, Cross-sections Cre vector into wild-type mice is not sufficient to alter intestinal transit time. 
from the colon of wild-type (top), wild-type infected with AAV9-CaMKII-Cre n=3 (wild type), 4 (WT + AAV) or 3 (AAr mice. Statistical test isa two-sided 
(middle) and Ahr“ (bottom) mice stained with Alcian blue-PAS (left) or non-parametric Mann-Whitney U-test. Scale bars, 50 pm. 
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Life sciences study design 
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Sample size Sample size was determined based on preliminary experiments or reports in the literature 
Data exclusions The intestinal transit time (ITT) data from one animal was excluded due to health issue. 
Replication Experiments were repeated with at least two biologically independent experiments. In all cases results were reproducible. 


Randomization _ Littermates (of mixed or the same sex) were randomly assigned to experimental groups in an age range of 8-16 weeks unless otherwise 
specified. 


Blinding Blinding was performed wherever possible, such as experiments in Fig 3a-d and h-k, Extended Data Fig. 7c and Extended Data Fig. 9b 
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[| Palaeontology [| MRI-based neuroimaging 
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Antibodies 
Antibodies used a-AhR Enzo Life Sciences BML-SA550-0100 Rabbit PC 1:200 


a-cKit R&D Systems AF1356 Goat PC 1:200 
a-Calbindin Abcam ab82812 Mouse MC 1:200 
a-Calretinin Swant CG1 Goat PC 1:200 

a-ChAT Millipore (Chemicon) AB144P Goat PC 1:200 
a-GFP Abcam ab13970 Chicken PC 1:200 

a-HuC/D Invitrogen A-21271 Mouse MC 1:200 
a-mCherry SICGEN ABO040-200 Goat PC 1:200 
a-NF-M Proteintech 66396-1-lg Mouse MC 1:200 
a-nNOS Abcam ab1376 Goat PC 1:200 
a-Peripherin Santa Cruz sc-7604 Goat PC 1:200 
a-PGP9.5 Bio-Rad 7863-0504 Rabbit PC 1:1000 
a-S100f DAKO z-0311 Rabbit PC 1:500 
a-Sox10 Santa-Cruz Biotechnology sc-17342 Goat PC 1:200 
a-Tuj1 Biolegend MMS-435P (801201) Mouse MC 1:200 
a-VIP Abcam ab22736 Rabbit PC 1:200 


a-Mouse Alexa Fluor 405 Abcam ab175658 Donkey PC 1:500 

a-Mouse Alexa Fluor 488 Invitrogen (Life Technologies) A-21202 Donkey PC 1:500 
a-Goat Alexa Fluor 488 Invitrogen (Life Technologies) A-11055 Donkey PC 1:500 
a-Rabbit Alexa Fluor 568 Invitrogen (Life Technologies) A-10042 Donkey PC 1:500 
a-Goat Alexa Fluor 568 Invitrogen (Life Technologies) A-11057 Donkey PC 1:500 
a-Mouse Alexa Fluor 647 Invitrogen (Life Technologies) A-31571 Donkey PC 1:500 
a-Rabbit AlexaFluor 647 Invitrogen (Life Technologies) A-31573 Donkey PC 1:500 
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a-Rabbait Alexa Fluor 488 Invitrogen (Life Technologies) R37118 Donkey PC 1:500 
a-Chicken Alexa Fluor 488 Jackson ImmunoResearch 703-545-155 Donkey PC 1:500 


Validation All antibodies used in our study are commercially available and have been validated by the manufacturer. The catalog number of 
each antibody is described above. 


Animals and other organisms 


Policy information about studies involving animals; ARRIVE guidelines recommended for reporting animal research 


Laboratory animals C57BL/6, Cypla1::Cre, AhR-/-, AhRfl/fl , R26LSL-Cyp1a, R26EYFP, Wnt1::Cre, Ai95D(RCL-GCaMP6f)-D. and ChAT-TVA-mCherry 
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conditions on a 12- 
experiment, anima 


his study. For 


our experiments we used male mice unless stated otherwise in Methods or relevant figure 


ere age matched and used at 8-16 weeks of age. All mice were maintained under specific pathogen-free (SPF) 
hour light/dark cycle (7am-7pm), and provided food and water ad libitum. For the intestinal motility rescue 
s were fed with purified diet supplemented with I3C (200 mg/kg; Sniff Spezialdiaten, GmbH). GF mice were 


maintained in flexible-film isolators at the Clean Mouse Facility of the University of Bern (Switzerland). For bacterial colonisation 


experiments, faeca 


contents of SPF mice were orally administered to wild-type GF mice and colonised animals were co-housed 


and maintained in the SPF facility of the University of Bern for 4 weeks prior the analysis. 


Wild animals No wild animals included. 
Field-collected samples No field-collected samples included. 
Ethics oversight All work was carried under a Home Office Project Licence and was approved by the Crick Institute's Animal Welfare and Ethical 


Review Body (UK). Animal work at the University of Bern was carried out according to Swiss Federal Regulations. 


Note that full information on the approval of the study protocol must also be provided in the manuscript. 
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Shear stress on arteries produced by blood flow is important for vascular 
development and homeostasis but can also initiate atherosclerosis’. Endothelial cells 
that line the vasculature use molecular mechanosensors to directly detect shear 
stress profiles that will ultimately lead to atheroprotective or atherogenic responses’. 
Plexins are key cell-surface receptors of the semaphorin family of cell-guidance 
signalling proteins and can regulate cellular patterning by modulating the 
cytoskeleton and focal adhesion structures* °. However, a role for plexin proteins in 
mechanotransduction has not been examined. Here we show that plexin D1 (PLXND1) 
has arole in mechanosensation and mechanically induced disease pathogenesis. 
PLXND1is required for the response of endothelial cells to shear stress in vitro and 

in vivo and regulates the site-specific distribution of atherosclerotic lesions. In 
endothelial cells, PLXND1is a direct force sensor and forms amechanocomplex with 
neuropilin-1 and VEGFR2 that is necessary and sufficient for conferring 
mechanosensitivity upstream of the junctional complex and integrins. PLXND1 
achieves its binary functions as either a ligand or a force receptor by adopting two 
distinct molecular conformations. Our results establish a previously undescribed 
mechanosensor in endothelial cells that regulates cardiovascular pathophysiology, 
and provide a mechanism by which a single receptor can exhibit a binary biochemical 


nature. 


Endothelial cells (ECs) are constantly exposed to the haemodynamic 
forces of blood flow, including the frictional force of fluid shear 
stress that—depending on vessel geometry—can be protective or 
pathogenic. Whereas disturbed or atheroprone flow patterns found 
in curvatures and bifurcations are associated with the upregulation of 
pro-inflammatory genes and the deposition of atherosclerotic lesions, 
uniform or atheroprotective shear stress induces the remodelling of 
the cytoskeleton and the alignment of ECs in the direction of flow’. 
The importance of shear stress in the development and function of 
the cardiovascular system has inspired efforts to identify endothelial 
mechanosensors, as they are the first to respond to changes in the 
mechanical environment’. 

Plexins are cellular receptors that have a range of important func- 
tions in axon guidance, tumour progression and immune-cell regu- 
lation’. Plexins are known to act primarily by binding to semaphorin 
ligands, in a cell-bound or free state in solution along with other 
coreceptors, resulting in intracellular signalling events that lead to 
large-scale changes in the cytoskeleton and cell adhesion**. Here we 
show that the guidance receptor PLXND1 acts as a mechanosensor in 
ECs, regulating vascular function and the site-specific distribution 
of atherosclerosis. 


To determine the role of PLXND1 under flow conditions, we trans- 
fected bovine aortic ECs with either a scrambled short interfering 
RNA (siRNA) or siRNA against PLXND1 (Extended Data Fig. 1a) and 
subjected the cells to shear stress. Knockdown of PLXND1 attenuated 
the activation induced by shear stress of the key signalling mediators 
Akt, ERK1 and ERK2 (hereafter ERK1/2) and eNOS (Extended Data 
Fig. 2a). PLXND1-dependent mechanotransduction is independent 
of its ligand SEMA3 E, as incubation with a SEMA3E-blocking antibody 
did not affect the flow-induced activation of signalling cascades 
(Extended Data Fig. 3). Next, we examined the role of PLXND1 in the 
hallmark response to atheroprotective shear stress by examining 
the alignment of ECs in the direction of flow. EC alignment with flow 
direction is highly correlated with atheroresistant regions of arteries 
and has an important function in the activation of anti-inflammatory 
pathways. PLXND1-depleted bovine ECs showed a notable failure 
to align in response to shear stress, and displayed fewer and more- 
disorganized actin stress fibres (Extended Data Fig. 2b). Quanti- 
fication of the EC alignment by measuring the orientation angle 
and the elongation factor indicate that PLXND1 is required for the 
alignment of ECs with the flow. We also examined the mRNA lev- 
els of the Kruppel-like factors KIf2 and Klf4—key anti-inflammatory 
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Fig.1| PLXND1 mediates the EC response to fluid shear stress and regulates 
the site-specific distribution of atherosclerosis. a,b, Mouse ECs were 
transfected with either scrambled or Plxnd1 siRNA and exposed to either 
atheroprotective or atheroprone flow for 24 h, using acone-and-plate 
viscometer. qPCR was performed to quantify the expression of K/f2 and K/f4in 
samples subjected to atheroprotective flow, and expression of the 
inflammatory markers Mcp] (also known as Cc/2) and Vcam1in samples 
subjected to atheroprone flow. n=4 biological replicates. c, The descending 
thoracic aorta was isolated and prepared en face from Plxndl“ and Plxnd1®*° 
mice and stained for B-catenin, phalloidin and DAPI to visualize the cell 
junctions, actin stress fibres and nuclei. Quantification of alignment and mean 
fluorescence intensity was performed using Image]; 3-5 images (each image 
hasn<100 cells) taken from 3 regions along the length of the descending aorta 
were obtained fromn=5 mice of each genotype (exact sample numbers are 
provided inthe Source Data). d, Representative en face preparations of whole 
aortas showing atherosclerosis in Plxndl““Apoe~ and Plxnd1®*°Apoe” mice 
after 10 weeks of high-fat diet feeding, visualized by oil-Red-O staining. 

e, Quantification of the lesion areain whole aortas and aortic arches from 
Plxndi“"Apoe* and Plxnd1®“°Apoe? mice.n=8 mice. f, Aortic arches from 
Plxndi"Apoe™ and Plxnd1®°Apoe’ mice were isolated and qPCR was 
performed for expression of the inflammatory markers Vcam1and Mcp1.n=5S. 
Data are mean+s.e.m. Pvalues were obtained using two-tailed Student’s t-tests 
using GraphPad Prism. *P< 0.05, **P< 0.01, ***P< 0.001, ****P< 0.0001. Scale 
bar,20 pm. 


transcription factors that are known to be upregulated by atheropro- 
tective shear stress*”. We found that knockdown of PLXND1attenu- 
ated flow-induced upregulation of both of these genes compared with 
control mouse ECs (Fig. 1a). We then investigated whether PLXND1 
could mediate the endothelial response to disturbed shear stress. 


We subjected mouse ECs to atheroprone flow for 24 hand examined 
mRNA levels of the pro-inflammatory genes monocyte chemoat- 
tractant protein-1 (Mcp1 (also known as Ccl2)) and vascular cell adhe- 
sion molecule-1 (Vcam1)"°. We noted that knockdown of PLXND1 in 
ECs treated with siRNA significantly reduced the upregulation of 
both genes in response to atheroprone shear stress (Fig. 1b). Taken 
together, these data demonstrate that PLXND1is acritical mediator 
of key shear-stress responses in ECs. 

To explore the biological relevance of our findings, we used atrans- 
genic mouse model to enable the endothelial-specific inducible dele- 
tion of PLXND1 (Plxnd1"“°) (Extended Data Figs. 1b, 7b). Confocal 
imaging of actin filaments in ECs and staining for the junctional marker 
B-catenin revealed a reduction in the elongation of ECs and a reduced 
intensity of actin stress fibres in the absence of PLXNDI1 (Fig. 1c)—con- 
sistent with in vitro observations (Extended Data Fig. 2b). 

Given the decrease in the expression of inflammatory genes in 
response to atheroprone shear stress in vitro that was observed with 
loss of PLXND1 (Fig. 1b), we assessed the role of endothelial PLXND1 
in a pathophysiogical setting. Atherosclerotic lesions are known 
to occur in regions of the vasculature with low or disturbed blood 
flow, flow reversal and other complex spatiotemporal flow patterns’. 
Systemic risk factors, such as hypercholesterolaemia, interact with 
local biomechanical factors to initiate and advance the deposition 
of atherosclerotic plaques. To assess whether endothelial deletion 
of PLXND1 affected atherosclerosis in vivo, we crossed Plxndi 
and Plxnd1"“° mice with hypercholesterolaemic apolipoprotein-E 
deficient (Apoe”) mice" and fed them a high-fat diet for 10 weeks. 
Although the body weights and lipid levels of the mice were unaf- 
fected by the loss of PLXND1 (Extended Data Fig. 4a), quantification 
of oil-red-O-stained aortic samples revealed a significant decrease 
in the plaque burden of both the whole aorta and the aortic arch in 
Plxndl®**°Apoe’ mice (Fig. 1d, e). To explore these differences fur- 
ther, we examined the expression of inflammatory markers in the 
inner curvature of the aortic arch. Immunostaining and quantita- 
tive (q)PCR analysis showed reduced levels of MCP-1 and VCAM-1in 
Plxnd1®“°’Apoe’ compared with Plxndl“Apoe™ mice (Fig. 1f and 
Extended Data Fig. 4b). Given the atheroprotective role of laminar 
shear stress and the reduced alignment of ECs with loss of PLXND1, we 
examined the effects in the atheroprotected descending aorta. After 
a high-fat diet for an extended period of 20 weeks, we observed an 
increase in the plaque burden in the descending aortas of Plxnd1*“° 
Apoe™ mice (Extended Data Fig. 5); these plaques also appeared to 
correlate with intercostal branch points that have flow disturbances. 
Together, these results show that endothelial PLXND1is required for 
the endothelial response to fluid shear stress and the site-specific 
distribution of atherosclerosis. 

The requirement of PLXND1 in flow-mediated responses in vitro 
and in vivo prompted us to investigate whether this is because 
PLXND1is simply a player in mechanochemical signalling cascades 
or functions as a mechanoreceptor that is capable of detecting 
mechanical force. We applied tensional forces, using a magnetic 
system”, to paramagnetic beads coated with an antibody that rec- 
ognizes the extracellular domain of PLXND1 and examined force 
responses using four different readouts. First, force on PLXND1 
induced activation of the same signalling cascades (ERK1/2, Akt 
and VEGFR2) (Fig. 2a), as those induced by shear stress” (Extended 
Data Fig. 2a). Second, we observed a robust transient increase in 
intracellular calcium levels in ECs when force was applied on PLXND1 
(Fig. 2b), similar to the response observed for other recently dis- 
covered mechanosensors™”. Third, we examined cytoskeletal 
responses”: ECs responded to the application of force on PLXND1 
by exhibiting a robust increase in both vinculin-positive focal 
adhesions (Fig. 2c) and ligated integrin B1 staining (Extended 
Data Fig. 6a). Notably, the mechanotransduction response was not 
restricted to the vicinity of the magnetic bead under tension, but 
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Fig. 2| PLXND1is amechanosensor that mediates the EC response to force. 
a, Mouse ECs were incubated with anti-PLXND1 or C44 (negative control) 
antibody-coated beads and subjected to force (10 pN) for the indicated time 
periods. Phosphorylation of VEGFR2, Akt and ERK1/2 was determined by 
western blotting and quantified using Image Studio Lite v.5.2.n=3 biological 
repeats. Phosporylated proteins are indicated by ‘p-’. *P< 0.05 relative to the no 
force condition, *P< 0.05 relative to the respective force application time point 
with PLXND1. b, Bovine aortic ECs were loaded with Fluo-8AM dye and then 
incubated with beads coated with an antibody against the extracellular domain 
of PLXND1or poly-L-lysine (negative control). The beads were then subjected to 
force (InN). Calcium responses were measured by calculating the fluorescent 
intensity of individual cells before (10s), during (20s) and after (30s) 
stimulation. Representative images are shown along with quantification. 

n=18 cells for PLXND1and n=19 cells for control across 3 independent 
biological replicates. ***P< 0.001 relative to unstimulated controls. Scale bar, 
10 um. Arepresentative trace of the calcium influx response over time is also 
shown. The arrow marks the start of the stimulation. AU, arbitrary units. 

c, Bovine aortic ECs were incubated with anti-PLXND1-coated beads and 
subjected to force (10 pN) for 30 min. ECs were fixed and stained with an anti- 
vinculin antibody to mark focal adhesions. Focal adhesion (FA) numbers were 
quantified using ImageJ software. Values were normalized to the no force 
condition. Locations of the beads are highlighted in yellow circles. n=50 cells 
for each condition from 3 independent biological replicates. ****P< 0.0001. 
Scale bar, 10 pm. d, Mouse ECs were incubated with anti-PLXND1 or C44 
antibody-coated beads and subjected to 10 pN force for the indicated time 
periods. Phosphorylation of vinculin was determined by western blotting and 
quantified using Image Studio Lite v.5.2. n =3 biological repeats. *P< 0.05 
relative to the no force condition, *P< 0.05 relative to the respective force 
application time point with PLXND1. Data are mean+s.e.m. Pvalues were 
obtained using two-tailed Student’s t-tests using GraphPad Prism. 


was a global cell-wide phenomenon. Fourth, we examined the phos- 
phorylation of vinculin at Y822, a site known to be phosphorylated 
when force is applied on E-cadherin”, and observed a significant 
increase in its activation after force application on PLXNDI (Fig. 2d). 
These mechanoresponses were specific to PLXND1, as ECs incubated 
with beads coated with another transmembrane receptor (CD44) 
or poly-L-lysine did not respond to force. These data demonstrate 
that PLXND1isa direct force sensor that can elicit robust and global 
mechanical signalling in ECs. 
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Fig. 3 | The PLXND1, NRP1and VEGFR2 mechanocomplex functions 
upstream of known mechanosensory hotspots and is sufficient for the 
response to shear stress. a, Schematic showing signalling at the junctional 
complex and integrins. b, Mouse ECs were transfected with scrambled or 
Plxnd1 siRNA, exposed to shear stress for 2 min before immunoprecipitating 
VEGFR2 and analysis of the phosphorylation and association of VEGFR2 with 
the p85 subunit of PI3K and VE-cadherin. n=3. IP, immunoprecipitation. 

c, Mouse ECs were transfected with scrambled or PLXND1 siRNA, exposed to 
shear stress for 30 min before immunoprecipitating integrin a,B, and analysis 
of the association of integrin a,B, with Shc. n=3.d, Mouse ECs were treated 
with the VEGFR2 kinase inhibitor SU1498, transfected with siRNA against Nrp1 
or treated with a NRP1-blocking antibody, incubated with anti-PLXND1-coated 
beads and subjected to force for 5 min before analysis of the phosphorylation 
of vinculin. (n=3;*P< 0.05) Data are mean +s.e.m. P values were obtained using 
two-tailed Student’s ¢-test using GraphPad Prism. e, Mouse ECs were exposed 
to shear stress before immunoprecipitating VEGFR2 and analysis of the 
phosphorylation and association of VEGFR2 with PLXND1, NRP1and Src.n=3. 
f, Mouse ECs were exposed to shear stress before immunoprecipitating NRP1 
and analysis of the association of NRP1 with PLXND1 and VEGFR2.n=3. 

g, Mouse ECs transfected with either scrambled or Nrp1 siRNA were exposed to 
shear stress before immunoprecipitating VEGFR2 and analysis of the 
association of VEGFR2 with PLXND1.n=3.h, Schematic showing that 
reconstitution of PLXND1, VEGFR2 and NRP1in COS-7 cells confers shear-stress 
sensitivity to these cells. i, COS-7 cells were left untransfected or transfected 
with NRPland VEGFR2, with or without PLXND1 before being subjected to 
shear stress for 2 min and VEGFR2 was immunoprecipitated. Shear-stress 
sensitivity was assessed by analysing the levels of phosphorylated VEGFR2, the 
complex formation between VEGFR2 and Src and the complex formation of 
PLXND1, VEGFR2 and NRPI1. n=3. All shear-stress experiments were at 

12 dynes cm” usinga parallel plate system. 
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Fig. 4| PLXND1 flexionis required for mechanotransduction. a, Schematic 
domain organization of PLXND1 spanning amino acids 1-1925. SS, signal 
sequence; TM, transmembrane region; c, cytoplasmic region. 

b, Representative negative-stain class averages of the PLXND1 ectodomain 
and corresponding structural models showing the ring-like and open 
conformations. Scale bar, 10 nm. Two-dimensional class averages were 
obtained by classifying 1,357 particles into 10 classes. c, Model of opening of 
the ring-like ectodomain, which confers the mechanosensory functions of the 
PLXNDI1. d, Design of the PLXND1 mutant with an intramolecular disulfide bond 
tolock the ring-like structure. The magnified view shows the disulfide bond 
between the SEMA domain (domain 1) and IPT5 domain (domain 9). e, f, Bovine 
ECs in which endogenous PLXND1 was knocked down were infected with 
adenoviruses expressing wild-type or mutant PLXNDI, treated with SEMA3E 
for 30 min or incubated with anti-PLXND1 paramagnetic beads followed by 
force application (10 pN; 30 min). Cells were immunostained with anti-vinculin 
antibodies. Focal adhesion numbers were quantified using ImageJ; n=30 cells 
across either 4 (e) or 3 (f) biological replicates. ****P< 0.0001. Scale bar, 10 pm. 
g, COS-7 cells were transfected with wild-type or mutant PLXND1, NRP1land 
VEGFR2 before shear-stress application for 2 min and VEGFR2 was 
immunoprecipitated. Shear-stress sensitivity was assessed by analysing the 
levels of phosphorylated VEGFR2, the complex formation between VEGFR2 
and Srcand the complex of PLXND1, VEGFR2 and NRP1.n=3.h, Mouse ECs in 
which endogenous PLXND1 was knocked down were infected with 
adenoviruses expressing wild-type or mutant PLXND1 and incubated with anti- 
PLXNDI paramagnetic beads followed by force application (10 pN). 
Phosphorylation of Akt, ERK1/2 and VEGFR2 was determined. n=3.*P<0.05 
relative to the no force condition; *P< 0.05 relative to the force time point of the 
respective wild-type protein. i, Mouse ECs in which endogenous PLXND1 was 
knocked down were infected with adenoviruses expressing wild-type or 
mutant PLXND1and subjected to fluid shear stress. Phosphorylation of Akt, 
ERK1/2 and eNOS was determined. n =3 biological repeats. *P< 0.05 relative to 
the static condition; *P< 0.05 relative to the respective shear time point of the 
wild-type protein. Data are mean +s.e.m. Pvalues were obtained using two- 
tailed Student’s t-test using GraphPad Prism. 


Cell-cell and cell-matrix adhesions represent two sites that are 
highly mechanically active within ECs. These sites include the junc- 
tional mechanosensory complex comprising PECAM-1, VEGFR2 and 
VE-cadherin and integrins at the cytoskeleton-extracellular matrix 
interface’” (Fig. 3a). We analysed the relationship between PLXND1 
and these mechanical ‘hotspots’ in ECs. En face confocal imaging 
revealed robust and similar expression of PLXND1 in ECs in both 
the arch and descending aorta and colocalization with PECAM-1 at 
cell-cell junctions (Extended Data Fig. 7a). Staining was specific as 
it was not observed in Plxnd1*“° aortas (Extended Data Fig. 7b). 
SEMAG3E was also observed in en face sections and expression of 
SEMA3E was found to be lower in the arch (Extended Data Fig. 7c). 
Co-immunoprecipitation experiments showed a flow-induced asso- 
ciation of PLXND1 with components of the junctional mechanosen- 
sory complex (PECAM-1, VEGFR2, VE-cadherin and the p85 subunit 
of PI3K) (Extended Data Fig. 7d). To explore whether PLXND1is just 
another component of the junctional complex or whether it operates 
upstream of the junctional complex, we used immunoprecipitation 
to analyse complex formation at the level of the junctional mecha- 
nosensory complex and integrin-matrix adhesions. We found that 
responses at the junctional complex, such as shear-stress-induced 
phosphorylation of VEGFR2 and association of the p85 subunit of 
PI3K and VE-cadherin with VEGFR2®, were all abrogated by knock- 
down of PLXND1 (Fig. 3b). Consistent with this observation, both the 
inhibition of the VEGFR2 receptor kinase (Fig. 3d) and the deletion of 
PECAM-1 abrogated force-induced signalling, suggesting that junc- 
tional mechanosensory components are necessary intermediates 
for the PLXND1-mediated force response (Extended Data Fig. 8a). 
Similarly, flow-induced complex formation at integrin-matrix adhe- 
sions (as assayed by association of Shc with integrin «,B;)'*”’ was also 
strongly reduced with loss of PLXND1 (Fig. 3c). A previous study has 
highlighted a role for PIEZO1-mediated and Ga,/G,,-mediated mecha- 
nosignalling, although there are conflicting reports as to whether 
these pathways are linked”°” or independent of each other”””’. Force 
application on PLXND1 showed that loss of Ga,/G,, abolished the 
PLXND1 force response, whereas knockdown of PIEZO1 had no effect 
(Extended Data Figs. le, f, 8b, c). 

To further investigate the molecular mechanisms, we examined 
the role of the PLXND1 coreceptor neuropilin-1 (NRP1). NRPlis a 
cell-surface transmembrane protein that acts as a SEMA3 and VEGF 
coreceptor for PLXND1 and VEGFR2, respectively”, and its presence 
inneurons switches the SEMA3E signal from repulsion to attraction”. 
We found that NRP1 was required for the PLXND1-mediated force 
response, as both knockdown (Extended Data Fig. 1d) and inhibition 
of NRP1 abolished the force-induced phosphorylation of vinculin 
(Fig. 3d). We also observed that shear stress induced the formation 
of acomplex between PLXND1, VEGFR2 and NRPI (Fig. 3e, f) and this 
complex was dependent on NRP1 (Fig. 3g). Taken together, these 
data show that PLXND1 associates with NRP1 and VEGFR2in response 
to flow and operates upstream of both the junctional complex and 
integrins. 

To test whether PLXND1 (and its molecular partners) is sufficient 
to confer mechanosensitivity in a heterologous cell line, we trans- 
fected COS-7 cells with plasmids expressing PLXND1, NRP1 and/ 
or VEGFR2 and applied shear stress (Fig. 3h). These cells do not 
express any of the components of the junctional complex (that is, 
PECAM-1or VE-cadherin) and are therefore an ideal system to moni- 
tor mechanical responses that are specifically due to PLXND1. COS-7 
cells expressing all three proteins (VEGFR2, NRP1 and PLXND1) 
showed activation of early signalling responses, including phos- 
phorylation of VEGFR2, association of VEGFR2 with Src tyrosine 
kinase, and PLXND1-VEGFR2-NRP1 complex formation in response 
to shear stress. Notably, none of these responses occurred in the 
absence of PLXNDI1, thus providing further evidence that PLXND1 
is a specific and direct force sensor (Fig. 3i). Overall, these data 


Nature | Vol578 | 13 February 2020 | 293 


Article 


provide evidence that PLXND1 is necessary and sufficient for the 
shear-stress-induced response. To further demonstrate that PLXND1 
operates as a specific force sensor, we applied force on other ele- 
ments of the complex. As shown in Extended Data Fig. 9, applica- 
tion of force on either NRP1 or VEGFR2 did not elicit downstream 
responses. Taken together, these data show that PLXND1is a specific 
and direct mechanosensor. 

The mechanical response of PLXND1is in stark contrast to the 
ligand response, as force on PLXNDI1 increases focal adhesions, 
whereas SEMA3E treatment reduces focal adhesions and leads to 
the collapse of the actin cytoskeleton‘ (Extended Data Fig. 6b). 
Structure-function studies of semaphorins, plexins and their cog- 
nate complexes have established that the ligand-binding response 
requires a dimeric semaphorin to engage the N-terminal SEMA 
domains of two plexin receptors”. Recent crystal structures 
and negative-stain electron-microscopy analyses of the entire, 
10-domain class-A plexin (PLXNA) ectodomains revealed a distinc- 
tive ring-like conformation that is suitable for coupling extracellular 
semaphorin-based dimerization through to the transmembrane and 
cytoplasmic regions to transduce the ligand-binding response’”’. 
However, the negative-stain electron-microscopy studies also 
revealed that the PLXNA ectodomain is capable of flexion, with 
distinctive minor populations of more-open conformations. We car- 
ried out negative-stain electron-microscopy analysis of the PLXND1 
ectodomain and found evidence that it can flex to a more-open 
conformation, although the dominant state is ring-like (Fig. 4a, b 
and Extended Data Fig. 10a). We speculated that the ability to have 
flexion and switch between these two conformation states might 
provide an explanation for the binary nature of the functions of 
PLXNDI1 (Fig. 4c). To examine this, we generated the double mutant 
PLXND1(Y517C/A1135C), which is designed to promote the forma- 
tion of an intramolecular disulfide bond between domain 1 and 
domain 9 of the PLXND1 ectodomain (Fig. 4d). On the basis of struc- 
tural analyses, we predicted that the introduction of this disulfide 
bridge would lock the receptor ectodomain into the ring-like con- 
formation, which would still enable the ligand-binding reponse 
by SEMA3E but would prevent the switch to the open and putative 
mechanosensory conformation. Purification of the protein anda 
subsequent quantitative assay using a thiol-reactive fluorescent 
dye, as well as negative-stain electron microscopy, demonstrated 
that the protein did indeed contain the desired covalent disulfide 
links (Extended Data Fig. 10b, c). 

PLXND1-depleted ECs were infected with adenovirus expressing 
either wild-type or mutant PLXND1 and were assayed for their ability 
to respond to SEMA3E or mechanical force. Treatment with SEMA3E 
resulted in a decrease in focal adhesions in both wild-type and mutant 
PLXND1-expressing cells (Fig. 4e), showing that the PLXND1 ecto- 
domain—when locked into a ring-like conformation—maintains its 
ability to bind to SEMA3 E and signal to cause the disassembly of the 
cytoskeleton. We then tested whether trapping the PLXND1 in the 
semaphorin-binding ring-like conformation was permissive of its 
mechanosensory function. We found that cells expressing mutant 
PLXND1 did not respond to mechanical force, as assayed by the acti- 
vation of early signalling responses (phosphorylation of VEGFR2, 
Akt and ERK1/2 in Fig. 4h), cytoskeleton signalling (phosphorylation 
of vinculin in Extended Data Fig. 11) and focal adhesion maturation 
(Fig. 4f). To further determine the requirement for PLXND1 flexion in 
mechanotransduction, we examined the effects of mutant PLXND1 in 
shear stress signalling. In contrast to ECs expressing wild-type PLXND1, 
ECs expressing mutant PLXND1 were unable to activate Akt, ERK1/2 or 
eNOS in response to shear stress (Fig. 4i). Additionally, reconstitution 
of mutant PLXND1in COS-7 cells blocked early shear-stress responses, 
including phosphorylation of VEGFR2, association of VEGFR2 with Src 
tyrosine kinase and shear-stress-induced VEGFR2 and NRP1 complex 
formation (Fig. 4g). Taken together, these results demonstrate that 
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trapping PLXND1 in its ring-like conformation maintains its ligand- 
dependent signalling function but compromises its ability to sense 
and respond to mechanical force. 

Our work identifies the semaphorin-binding receptor PLXND1 as 
a force detector in ECs. One of the best-characterized mechanosen- 
sors to date is the junctional mechanosensory complex, in which 
PECAM-1is the molecule that can sense and respond to mechanical 
force??829_ Given the proven crucial role of shear stress in cardiac 
and vascular development”, it was always difficult to reconcile 
the lack of developmental defects in the PECAM-1 knockout mice. 
We now identify a previously undescribed mechanosensor in ECs 
that operates upstream of the junctional complex. We show that 
onset of shear stress induces the formation of a mechanocomplex 
of PLXND1-NRP1-VEGFR2; this complex requires the presence of 
NRP1as wellas flexion in the PLXND1 ectodomain. Endothelial PLXND1 
regulates signals at junctions and integrins and downstream cellular 
responses to shear stress that ultimately regulate the site-specific 
distribution of atherosclerosis. The developmental cardiovascular 
defects observed in global”, as well as EC-specific®, PLXND1 knockout 
mice are in agreement with a requirement for this mechanosensor 
during development, and—as our data now demonstrate—also in 
the adult. Despite the importance of mechanosensation in biology, 
knowledge of how mechanoreceptors detect physical force is limited. 
Our data identify a mechanosensor in ECs and provide a framework 
for understanding how ligand-dependent and mechanical signals 
can be channelled through a single receptor. 
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Methods 


Data reporting 
No statistical methods were used to predetermine sample size. Mouse 
experiments were randomized. 


Experimental mice 

All mouse experiments were approved and authorized by both the 
University of Oxford Local Animals Ethics and Welfare Committee 
and by the UK Home Office. Project licences used in this work were 
30/3080 and POC27F69A. Plxndé“ mice were obtained from). Epstein. 
To obtain the endothelial-cell-specific deletion of Plxnd1, Plxndi™ mice 
were crossed with mice that expressed an inducible Cre recombinase 
under the Cdh5 driver, which were obtained from R. Adams. Three 
consecutive intraperitoneal injections of tamoxifen (2 mg each) in adult 
mice (6-8 weeks of age) resulted in the deletion of endothelial Plxnd1 
to generate PLXND1 inducible endothelial-cell knockout (Plxnd1*“°) 
mice. For atherosclerosis studies, the Plxndl“"Cdh5“” mice were crossed 
into the hypercholesterolaemic apolipoprotein-E-deficient (Apoe” ) 
mouse background. Only female mice were used for atherosclerosis 
studies. All mice used in this study were maintained on a C57BL/6) back- 
ground. For en face immunofluorescence analysis and qPCR of aortas, 
tissues were collected two weeks after the last tamoxifen injection. For 
atherosclerosis studies, a high-fat diet was commenced one week after 
the last tamoxifen injection. 

All mice were housed in individually ventilated cages at 22 °C, with 
56% relative humidity and a light-dark cycle of 12 h-12 h, and were fed 
astandard chow diet (B&K). For high-fat diet experiments carried out 
with hypercholesterolaemic Apoe“ mice, the mice were fed western 
RD (P) VP 25kGy diet containing 20% fat, 0.15% cholesterol (829108, 
SDS) for 10 weeks or 20 weeks. Water and food were available ad libi- 
tum at all times. 


Genotyping 

Genotyping was carried out using PCR analysis of DNA from ear notches, 
collected to identify the mice, using the Phire Tissue Kit (F140-WH, 
Thermo Scientific). 


En face preparations 

Mice were placed under a terminal general anaesthesia with isoflu- 
rane, followed by exsanguination and perfusion fixation with 4% para- 
formaldehyde. The entire length of the aorta was dissected out and the 
surrounding connective tissue and adventitial fat were removed. The 
aorta was fixed in 4% paraformaldehyde and stored at 4 °Cin PBS until 
staining. Atheroprone areas from the inner curvature of the aortic arch 
were isolated and atheroprotective areas from the thoracic aorta were 
dissected and processed for immunofluorescence studies. 


Oil-red-O staining for atherosclerosis studies 

Fixed aortas were rinsed in absolute propylene glycol and stained with 
oil red O (01516, Sigma Aldrich). After washing in 85% propylene glycol 
solution and distilled water, the aortas were opened longitudinally to 
the iliac bifurcation and a coverslip was placed to flatten down the aorta 
with the endothelial surface facing upwards. Images were acquired 
using an Olympus SZX7 fitted with a 1x lens and image processing was 
performed using Image-Pro (Media Cybernetics). The plaque area was 
quantified as a percentage of the area of both the total aorta and the 
aortic arch. 


Lipid profile analysis 

Blood was sampled by cardiac puncture under terminal general anaes- 
thesia in plasma collection tubes. Plasma samples were shipped to MRC 
Harwell where they were analysed for total cholesterol, triglycerides, 
high-density lipoprotein and low-density lipoprotein levels onan auto- 
mated AU680 Clinical Chemistry Analyser. 


Cell culture, shear stress and transfections 
Bovine aortic endothelial cells (BAECs), Pecam1-knockout (Pecam1™ ) 
and PECAM reconstituted (Pecam1**) mouse cells were cultured as pre- 
viously described. Mouse lung ECs were isolated from PlxndE™ mice 
and maintained in EGM2 growth medium (Lonza), supplemented with 
10% fetal bovine serum (FBS). COS-7 cells were maintained in DMEM 
with 10% FBS. All cell types were maintained at 37 °C in 5% CO, ina 
humidified incubator. Cells were subjected to shear stress using either 
a parallel plate chamber” or acone-and-plate viscometer as previously 
described”®. For experiments using a SEMA3E-blocking antibody, cells 
were treated with the antibody for 1h before and during shear stress. 
siRNA reverse transfections of Plxnd1 and PLXND1 in mouse ECs and 
BAECs were performed using the Lipofectamine RNAimax Reagent 
(Invitrogen). siRNAs used in this study were from Dharmacon and are 
described below. 

Transfections of plasmids expressing NRP1, VEGFR2 and PLXND1 
in COS-7 cells were performed with Lipofectamine 2000 (Invitrogen) 
according to the manufacturer’s instructions. 


siRNA knockdown in mouse ECs 

The Acell mouse Plxnd1 SMARTpool consisted of GUAUCGACCAC 
AGAUCAUG, CGUGGACCUUGAAUGGUUU, CUAUUAUAAACAGAUCCAA 
and CCAACAAGCUUCUGUACGC; The Acell mouse Piezol SMARTpool 
consisted of CUAUCAGACACCAUUUAUC, GCCUCAUCCUCUAUAAUGU, 
UCAUCAUCUCUAAGAAUAU and CUGUUACGCUUCAAUGCUC; the 
Acell mouse Gnall SMARTpool comprised CCAUUUUCUAAGU 
UAUUGA, CUUUUGAGCACCAGUAUGU, CUGUGACCCUUGUAUAUUA 
and CUGUCAGAUUUCUUUACUU; the Acell mouse Gnag SMARTpool 
comprised UUGUCAAGUUGUACGAAUU, CCAGGAUCAUAAGUGUUAA, 
GUAUAGUGCAAUUAUGAAU and CGAUCAUACUAGGAGGGAU; the Acell 
human NRP1 SMARTpool consisted of GGAGGAUUUUCCAUACGUU, 
CUUGAAUGCACUUAUAUUG, UGGUUAUCCUCAUUCUUAU, UCCU 
GGAAUUUGAAAGCUU. The scramble siRNA was ON-TARGET plus 
non-targeting pool comprising UGGUUUACAUGUCGACUAA, UGGU 
UUACAUGUUGUGUGA, UGGUUUACAUGUUUUCUGA, UGGUUU 
ACAUGUUUUCCUA. 


siRNA knockdown in BAECs 


APLXND1 custom duplex comprising GGGAAAACAUCGAGGCCAAUU 
and UUGGCCUCGAUGUUUUCCCUU. 


RNA extraction and qPCR 

Total RNA extraction was performed from cells or from tissue using the 
RNeasy Plus Mini kit (Qiagen), with an additional genomic DNA-wipeout 
step. Reverse transcription was performed using the Superscript III 
cDNA synthesis kit. qPCR was performed in triplicate with SYBR green 
andaCFX96TM real-time system. Thermocycling conditions were 95 °C 
for 3 min, followed by 40 cycles of 95 °C for 15 s, 60 °C for 45 s. Gene 
expression was normalized to the constitutively expressed housekeep- 
ing gene 18S rRNA, and relative expression was calculated and plotted 
using the AAC, method. Primer sequences used were as follows. K/f2, 
5’-CTAAAGGCGCATCTGCGTA-3’, 5’-TAGTGGCGGGTAAGCTCGT-3’; 
KIf4, 5’-CGACTAACCGTTGGCGTGA-3’, 5’-GAGGTCGTTGAACT 
CCTCGG-3’; Mcp1, 5’-CATCCACGTGTTGGCTCA-3’, 5’°GATCATCT 
TGCTGGTGAATGAGT-3’; Vcam1, 5’-GCTATGAGGATGGAAGACTCTGG-3’, 
5’-ACT TGTGCAGCCACCTGAGATC-3’; 18S rRNA, 5’-AGGAAT TGAC 
GGAAGGGCACCA-3, 5’-GTGCAGCCCCGGACATCTAAG-3’. 


Immunofluorescence 

The permeabilization of tissues and cells was performed by incubation 
with 0.5% Triton X-100 overnight and 0.2% Triton X-100, respectively, 
and cells were blocked with 10% normal goat serum and 1% BSA. Inner 
curvatures of the aortic arch were incubated with primary antibodies 
(CD106 (VCAM-1, 553330, BD Biosciences) and MCP-1 (ab7202, Abcam)) 


and descending aorta segments were incubated with primary anti- 
bodies (PLXND1 (PA5-21605, Thermo Fisher Scientific) and PECAM-1 
(553369, BD Biosciences)) before incubation with Alexa Fluor 488- and 
Alexa Fluor 568-conjugated secondary antibodies (1:100; Invitrogen). 
Cells were subjected to shear stress or tissues were incubated at 4 °C 
overnight in B-catenin (610153, BD Biosciences) followed by 1hincuba- 
tion with Alexa Fluor 488-conjugated phalloidin (Invitrogen) at room 
temperature and DAPI (Invitrogen). Tissues were mounted en face with 
Prolong Gold Antifade mountant (Invitrogen) for confocal imaging 
using an Olympus FluoView3000. 


Image analysis 

For the quantification of in vitro flow experiments, cell alignmentinthe 
direction of the flow was determined by measuring the angle between 
the flow direction and the long axis of the cell as determined visually™. 
Cell elongation was estimated as the ratio of cell length to cell width 
in bothin vitro and in vivo studies®. Measurement of the fluorescence 
intensity of VCAM-1, MCP-1and phalloidin was performed using ImageJ 
software (options: Analyze, Set measurements, Mean gray value, Meas- 
ure). Quantification of the colocalization was performed using the 
coloc2 plugin in Image). 


Co-immunoprecipitation and western blotting 

Cells were collected in lysis buffer as previously described”? and sup- 
plemented with protease and phosphatase inhibitor cocktail tablets. 
Lysates were precleared with 10 pl protein A/G plus sepharose beads 
(Santa Cruz Biotechnology) for 1h at 4 °C. The precleared lysates were 
then incubated with 20 pl of protein A/G plus sepharose beads, which 
had previously been coupled with the appropriate primary antibody 
for 2 hat 4 °C onan orbital shaker. The beads were washed three times 
with lysis buffer supplemented with protease and phosphatase inhibi- 
tors. Theimmunoprecipitation complexes were eluted from the beads 
by boiling in 2x SDS buffer for 5 min. 

For all western blotting analyses, protein lysates and co-immunoprecip- 
itation complexes were resolved on a 4-12% gradient gel with the appro- 
priate primary antibodies and IRDye-conjugated anti-mouse, anti-goat or 
anti-rabbit secondary antibodies, as appropriate. Images were acquired on 
aLICOR Odyssey infrared scanner. Densitometric quantification of bands 
was performed using the ImageStudio software (LICOR Biosciences). 


Inhibitors, antibodies and other reagents 

The antibodies used for western blotting included phosphorylated 
(p)-ERK1/2 (T202/Y204), total (t)-ERK1/2, p-Akt (S473), t-Akt, p-eNOS 
(S1177), p-VEGFR2 (Y1175), t-VEGFR2 (all antibodies from Cell Signaling 
Technology), t-eNOS (BD Biosciences), p-vinculin (Y822) (Abcam), t-vin- 
culin (Sigma Aldrich), PI3K/p85 (Upstate), integrin a8, (clone LM609, 
Merck), Shc (Abcam),VE-cadherin (Santa Cruz), PLXND1 (Thermo Fisher 
Scientific and Abcam), PIEZO1 (Abcam), Go, (Santa Cruz Biotechnol- 
ogy) and Src (Upstate). 

The inhibitors used in the study included the VEGFR2 tyrosine kinase 
inhibitor SU1498 (Sigma Aldrich). Recombinant SEMA3E was purchased 
from R&D Systems (Bio-techne) and used at 10 pM. The NRP1-blocking 
antibody was purchased from R&D Systems and the SEMA3E-blocking 
antibody was from Thermo Fisher Scientific USA. 


Bead pulling/magnetic tweezer system 

Tosyl-activated paramagnetic beads (4.5 pm) were washed with PBS and 
coated with an antibody against the extracellular domain of PLXND1 
(Santa Cruz) or CD44 (clone 5D2-27 from the Developmental Studies 
Hybridoma Bank, USA). Beads were quenched in 0.2M Tris, pH 7.4 before 
use to eliminate any remaining tosyl groups. ECs were incubated with the 
beads (and inhibitor or blocking antibody, ifappropriate) before force 
application for 5-30 min at 37 °C. Forimmunofluorescence, ECs grown 
onfibronectin-coated coverslips were fixed for 20 minin PBS containing 
2% formaldehyde, permeabilized with 0.2% Triton X-100 and blocked 


with 10% goat serum for 1h at room temperature. Antibody incubations 
for vinculinand HUTS4 were performed as previously described”. Focal 
adhesion numbers were quantified as previously described*®. Ligated B1 
integrin staining was quantified by determining the mean fluorescence 
intensity using ImageJ software. To analyse the phosphorylation of 
vinculin, cells were lysed as described above and lysates were immuno- 
bloted witha primary antibody against p-vinculin (Abcam). 


Calcium imaging 

BAECs were cultured in 33-mm glass-bottom dishes to form a sub- 
confluent monolayer. After the cells had fully attached and spread, 
4 uM of Fluo-8 AM, a calcium-binding dye (Abcam), was added to the 
medium. Cells were incubated for 30 min with the dye. Beads conju- 
gated to either PLXND1 or poly-L-lysine were added to the cells and 
incubated for another 30 min. To assess the calcium influx as a result 
of mechanical stimulation, cells with Fluo-8 AM and magnetic beads 
were subjected to 1nN force applied with magnetic tweezers. Time- 
lapse videos of epifluorescent calcium imaging were acquired with a 
Nikon Ti-e microscope (60x objective) during 10s prestimulation, 20s 
stimulation and 30 s poststimulation. 

Acquired image sequences were analysed by measuring mean fluo- 
rescence intensity (mean pixel value) for each cell at each frame. Mean 
peak amplitude for each phase (prestimulation, poststimulation and 
during stimulation) was calculated and normalized to the prestimula- 
tion fluorescence intensity for each cell. 


SEMA3E challenge 

BAECs in which endogenous PLXND1 was knocked down with siRNA 
were infected with either wild-type or mutant PLXND1-expressing 
adenoviruses. Cells were serum-starved and treated with recombinant 
SEMA3E before processing for immunofluorescence with phalloidin, 
DAPI and anti-vinculin antibody. Images were taken ona Zeiss LSM 880 
Airy Scan Confocal microscope and analysed using ImageJ” using an 
in-house-generated macro to measure the cell area, focal adhesion num- 
ber and focal adhesion area. Statistical analyses were performed using 
GraphPad Prism 7. Comparisons between groups were assessed by two- 
way analysis of variance (ANOVA) with a Tukey multiple-comparisons 
post hoc test. Difference were considered significant when P< 0.05. 


Site-directed mutagenesis 

To lock the ectodomain of PLXND1 in the ring-like conformation, we 
designed a double mutant by introducing two single point mutations, 
Y517C and A1135C, in the SEMA domain (domain 1) and IPTS domain 
(domain 9), respectively. Site-directed mutagenesis of full-length 
PLXND1, and of the PLXND1 ectodomain, was carried out by multiple- 
step overlap-extension PCR, and the resulting PCR products were 
cloned into a pHLSec vector®®. 


Protein production 

Constructs encoding the ectodomain (residues 47-1271) of mouse 
PLXND1 or double-mutant PLXND1(Y517C/A1135C) were cloned into 
the pHLsec vector in frame with a C-terminal hexahistidine (6His) tag. 
Protein was produced by transient transfection in HEK293T cells at 
37 °C. Conditioned medium was collected five days after transfection 
and buffer was exchanged using a QuixStand diafiltration system (GE 
Healthcare). The double mutant of PLXND1 was secreted at a similar 
level as the wild-type protein. Proteins were purified by immobilized 
metal-affinity chromatography using a HisTrap FF column (GE Health- 
care) followed by size-exclusion chromatography using a Superdex 
200 Increase 10/300 column (GE Healthcare). 


Alexa Fluor labelling of PLXND1(Y517C/A1135C) for validation of 
disulfide-bond formation 


PLXND1(Y517C/A1135C) at a concentration of 10 pMin PBS was labelled 
with a 20-fold molar excess of a thiol-reactive fluorescent dye, Alexa 
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Fluor 488 C5 maleimide (Thermo Fisher Scientific), and the reaction 
was allowed to proceed for 24 h at 6 °C inthe dark. Unreacted dye was 
removed from the labelled protein using a Sephadex G-25 column (GE 
Healthcare). The degree of labelling (n) was determined using Eq. (1): 


_ AggsM 
EC 


n 


(1) 


in which A4¢g is the absorbance at 488 nm, Mis the molecular mass of 
the protein, eis the molar extinction coefficient of the dye and cis the 
protein concentration in mg ml”. Hen egg ovalbumin (GE Healthcare) 
was used as a positive control. 


Negative-stain electron microscopy 

Adrop of 2.5 ul freshly gel-filtrated PLXND1 ectodomain at a concentra- 
tion of 1-5 ug mIin10 mM HEPES, pH 7.5 and 150 mM sodium chloride 
was adsorbed to anewly glow-discharged carbon-coated copper grid, 
washed with two drops of 50 ul deionized water, and stained with two 
drops of 50 pl 0.75% uranyl formate. The excess stain on the grids was 
removed with filter paper before air-drying. Samples were imaged 
at room temperature using an FEI Tecnai T12 electron microscope 
equipped with a LaBé filament operating at an acceleration voltage 
of 120 kV and a dose of 15 electrons per A”. Images were taken using a 
4,000 x 4,000 FEI Eagle TM CCD camera at a magnification of 57,000x 
with under-focus values ranging from 1.0 to 1.5 pm and a pixel size of 
2.16 A. The particle images were normalized, rescaled, filtered before 
being subjected to reference-free classification in EMAN2”. The PLXND1 
structural models were generated manually using The PyMOL Molecu- 
lar Graphics System (Schrédinger). 


Cloning and adenoviral generation 

Wild-type and mutant PLXND1 were cloned into the pENTR/TOPO entry 
vector of the Gateway System (Invitrogen) using the KOD Hot Start High 
Fidelity polymerase. After confirmation of successful cloning by Sanger 
sequencing, the constructs were sub-cloned into the pAd/CMV/V5-Dest 
destination vector by LR Clonase Il reaction. All steps were performed 
according to the manufacturer’s instructions. The destination vector 
was linearized by Pacl digestion and transfected into HEK293A cells 
for adenoviral generation and subsequent amplification according to 
the manufacturer’s instructions. In experiments in which adenoviral 
overexpression was used, endogenous levels of PLXND1 were knocked 
down with the siRNA pool to minimize any background signals. 


Statistics 

Data are mean +s.e.m. All experiments were performed at least three 
times independently. Statistical significance was tested using either 
an ANOVA or unpaired Student’s t-tests. Data were tested for normality 
using the Shapiro-Wilk test and equality of variance using the Levene 
test. Where necessary, data were log-transformed before being ana- 
lysed for statistical significance. All image analysis was performed by 
operators who were blinded to the treatments administered. 


Reporting summary 
Further information on research design is available in the Nature 
Research Reporting Summary linked to this paper. 


Data availability 


The datasets generated during and/or analysed during this study are 
either included within the manuscript or are available from the cor- 
responding author on reasonable request. Source Data for Figs. 1-4 
and Extended Data Figs. 2-11 are provided with the paper. Gel source 
data can be found in Supplementary Fig. 1. 
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Extended Data Fig. 1| Knockout or knockdown of PLXND1 and other genes in 
ECs. a-f, ECs were either isolated from Plxnd1 and Plxnd1*“ mice, or treated 
with siRNAs to knockdown PLXNDI1, NRP1, PIEZO1 and Gag. Knockdowns and 
knockouts were confirmed by western blotting, using GAPDH as a loading 
control. g, PLXND1 was knocked down in mouse ECs using a pool of siRNAs, 


followed by infection with an adenovirus expressing either B-galactosidase 
(LacZ), or wild-type or mutant PLXND1. Protein levels were normalized to 
GAPDH. KD, mean knockdown efficiency based onn=3; KO, mean knockout 
efficiency based onn=3. 
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Extended Data Fig. 2| PLXND1 mediates the EC response to fluid shear 
stress. a, BAECs were transfected with scrambled or PLXNDI1 siRNA and 
exposed to laminar fluid shear stress (12 dynes cm”) using a parallel plate 
system for the indicated time periods. Phosphorylation of Akt (n= 6), ERK1/2 
(n=5) and eNOS (n=8) was determined by western blotting and quantified 
using Image Studio Lite v.5.2. Dataare mean + s.e.m. Pvalues were obtained by 
two-tailed Student’s t-tests using GraphPad Prism.*P< 0.05 relative to the static 
condition; *P<0.05 relative to the shear time point of the respective scrambled 


siPlxnD1 


siRNA. b, BAECs were transfected with scrambled or PLXNDI siRNA and 
exposed to atheroprotective shear stress for 24 h. Cells were fixed and stained 
with phalloidin, DAPI and anti-B-catenin antibodies to visualize actin stress 
fibres, nuclei and celljunctions, respectively. Quantification of alignment was 
performed using ImageJ; n> 50 cells across 4 biological replicates (exact 
sample numbers are provided inthe Source Data). Dataare mean+s.e.m. 
Pvalues were obtained using two-tailed Student’s t-tests using GraphPad 
Prism. ****P< 0.0001. 
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Extended Data Fig. 3 | Mechanotransduction by PLXND1is independent of vinculin antibody, then stained with phalloidin and DAPI to visualize focal 
its ligand-binding functions. a, BAECs were treated with SEMA3E-blocking adhesions, actin stress fibres and nuclei, respectively. EC collapse was 
antibody or control antibody (1 pg ml) and exposed to fluid shear stress for quantified by measuring the cell area using ImageJ. Data are mean+s.e.m. 
theindicated times. Phosphorylation of eNOS, AktandERK1/2wasdetermined Significance was determined by ANOVA witha Tukey post hoc test using 
by western blotting and quantified using Image Studio Lite v.5.2.n=3 GraphPad Prism. ****P< 0.0001. n=59-82 cells across 3 independent 
biological repeats. Data are mean+s.e.m. b, BAECs were treated with SEMA3E- experiments (exact sample numbers are provided in the Source Data). Scale 
blocking antibody or control antibody for 1h before exposure to SEMA3E for bar, 50m. 
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Extended Data Fig. 4 | Lipid-profile analysis and expression of inflammatory 
markers inthe aortic arch. a, Body weights and lipid-profile analysis of 
Plxnd¥“Apoe* and Plxnd1®°Apoe* mice after 10 weeks of high-fat diet 
feeding (analysed at 16-17 weeks of age); n=8. Dataare mean+s.e.m. 

b, Representative en face preparations of aortic arches immunostained for 
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Extended Data Fig. 6 | See next page for caption. 


Extended Data Fig. 6 | Mechanical force on PLXND1 results in integrin 
activation, whereas ligand stimulation causes ECs to collapse. a, BAECs were 
incubated with anti-PLXND1-coated beads and subjected to force (10 pN) for 

5 min. ECs were fixed and stained with HUTS4 antibody to mark ligated B1 
integrin. Mean fluorescence intensity was quantified using ImageJ software. 
Values were normalized to the no force condition. Locations of the beads are 
highlighted in yellow circles. n=50 cells per condition from 3 independent 
experiments. Data are mean +s.e.m. Pvalues were obtained using two-tailed 


Student’s ¢-tests using GraphPad Prism. ****P< 0.0001. Scale bar, 10 pm. 

b, BAECs were incubated with SEMA3E or vehicle, fixed and stained with anti- 
vinculin antibody to mark focal adhesions. Focal adhesion number was 
quantified using Image] software. Values were normalized to the vehicle 
condition. n=30 cells per condition from 3 independent experiments. Data are 
mean +s.e.m. Pvalues were obtained using two-tailed Student’s ¢-tests using 
GraphPad Prism. ****P< 0.0001. Scale bar, 10 pm. 
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Extended Data Fig. 7 | See next page for caption. 
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Extended Data Fig. 7 | PLXND1 colocalizes and associates with members of 
thejunctional mechanosensory complex, and its levels are not regulated by 
flow, in contrast to SEMA3E. a, The descending thoracic aorta or the inner 
curvature of aortic arches were isolated and prepared en face from wild-type 
mice and stained for PLXND1, PECAM-1and DAPI. Quantification of PLXND1 
levels was performed by fluorescence intensity measurement using ImageJ; 
4-6 images were taken of tissue collected fromn=4 mice. Data are 

mean +s.e.m. Scale bar, 20 pm. b, The descending thoracic aorta was isolated 
and prepared en face from Plxnd1®“° mice and stained for PLXND1 expression 
to assess the specificity of the PLXND1immunostain. n =3 mice all showed 
similar results. c, The descending thoracic aorta or the inner curvature of aortic 


arches were isolated and prepared en face from wild-type mice and stained for 
SEMA3E and PECAM-1expression and with DAPI. Quantification of SEMA3E 
levels was performed by fluorescence intensity measurement using ImageJ; 
4-6 images were taken using tissue collected fromn=3 mice. Data are 

mean +s.e.m. Pvalues were obtained using two-tailed Student’s ¢-test using 
GraphPad Prism. *P<0.05. Scale bar, 20 um. d, Mouse ECs were exposed to 
shear stress for the indicated times or left as static controls before 
immunoprecipitating PLXND1 and analysing its association with the junctional 
mechanosensory complex (PECAM, VE-cadherin and VEGFR2) as well as PI3K/ 
p85.n=3 independent experiments. 
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Extended Data Fig. 8 | See next page for caption. 


Extended Data Fig. 8 | Relationship between PLXND1 and other established incubated with anti-PLXND1-coated beads and subjected to force application 


mechanosensors. a, Pecam1* and PecamI“ ECs were incubated with anti- for 5 min before the analysis of the phosphorylation of vinculin. n=3.*P< 0.05. 
PLXND1-coated beads and subjected to force application for 5 min before the Data are meant+s.e.m. Pvalues were obtained using two-tailed Student’s t-tests 
analysis of the phosphorylation of vinculin. n=3 independent experiments. using GraphPad Prism. 


*P<0.05.b,c, Mouse ECs were treated with siRNAs against Piezol and Gog, 


Article 


Force (min) 


1.5 pAKT 


= 
o 


pAKT 
(fold change) 


© 
° 


Force (min) 


pAKT 


—_ —_ 
o ua 


pAKT 
(fold change) 


0.0 
0' 2' s' 


mechanocomplex does not elicit amechanotransduction response. 


HUE 


Extended Data Fig. 9 | Force application on other members of the PLXND1 


a, b, Mouse ECs were incubated with anti-VEGFR2 (a) or anti-NRP1 (b) antibody- 
coated beads and subjected to force (10 pN) for the indicated time periods. 
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Phosphorylation of Akt and ERK1/2 was determined by western blotting and 
quantified using Image Studio Lite v.5.2. n=3 biological repeats. Data are 


mean+s.e.m. 
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Extended Data Fig. 10 | Validation of the PLXND1 mutant. a, Negative-stain molecules form the disulfide-linked bond and thus the ring of the majority of 
two-dimensional class averages of wild-type PLXND1 were obtained by PLXND1-mutant molecules appears to be locked by the covalent bond. The 
classifying 1,305 particles into 10 classes. Scale bar, 10 nm. b, Negative-stain degree of labelling for the hen egg ovalbumin, which we used asa positive 
two-dimensional class averages of mutant PLXND1 were obtained by control, is close to the number of free cysteines in ovalbumin. n=3 
classifying 1,357 particles into 10 classes. c, The double mutant of PLXND1 was independent experiments. Data are mean +s.e.m. Pvalues were calculated by 
labelled with a thiol-reactive fluorescent dye, Alexa Fluor 488 C5 maleimide. two-tailed Student’s t-tests using GraphPad Prism. ***P< 0.001. 


The degree of labelling shows that the vast majority of PLXND1 mutant 
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B-galactosidase (Ad.LacZ), wild-type or mutant PLXND1 and incubated with tailed Student’s ¢-tests using GraphPad Prism. *P<0.05. 
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Sample size To determine the number of animals required to complete the proposed 
studies, we performed power analyses based on literature review and data from our laboratory using the SPSS statistical software. Too high 
numbers were avoided to keep the number of experimental animals as low as possible. All cell-based experiments had at least 3 biological 
replicates. 


Data exclusions No samples or animals were excluded. 
Replication Experimental findings were successfully replicated, by either the same author(s) or a different author. 


Randomization Age/sex matched litter mates were randomly assigned to groups. For magnetic force application experiments experiments, cells with 1-3 
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semi-automated fashion. 


Reporting for specific materials, systems and methods 


Materials & experimental systems Methods 

n/a | Involved in the study n/a | Involved in the study 
Unique biological materials ChIP-seq 
Antibodies Flow cytometry 
Eukaryotic cell lines MRI-based neuroimaging 


Palaeontology 


Animals and other organisms 


Human research participants 


Unique biological materials 


Policy information about availability of materials 


Obtaining unique materials | Mouse cell lines, adenoviral constructs unique to this study can be obtained from the corresponding author upon reasonable 
request. 


Antibodies 


Antibodies used phospho(p)-ERK1/2T-202;Y-204 (9106, Cell Signaling Technology, 1:1000), total(t)-ERK1/2 (9102, Cell Signaling Technology, 


1:1000), pAktS473 (4060, Cell Signaling Technology, 1:1000), tAkt (4691, Cell Signaling Technology, 1:1000), p-eNOSS1177 (9571, 
Cell Signaling Technology, 1:1000), pVEGFR2Y1175 (2478, Cell Signaling Technology, 1:1000), tVEGFR2 (2479, Cell Signaling 


Technology, 1:1000), t-eNOS (61 


0296, BD Biosciences, 1:1000), p-VinculinY822 (ab200825, Abcam, 1:1000), t-Vinculin (V9131, 


Sigma Aldrich, 1:1000), PI3K/P85 (06195,Upstate,1:1000), integrin avB3 (clone LM609, Merck, 1:1000), Shc 
(ab15039,Abcam,1:1000),VE-cadherin (C-19, sc-6458,Santa Cruz Biotechnology, 1:1000), Fibronectin (HFN7.1, Developmental 


Society Hybridoma Bank, 1:50), 
Alexafluor 488 goat anti-mouse 


CP1 (ab7207, Abcam, 1:50), beta-catenin (610153, BD Transduction Laboratories, 1:70), 
A11001, Invitrogen, 1:150), Alexafluor 568 goat anti-mouse (A11061, Invitrogen, 1:150), 
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Alexafluor 680 goat anti-rabbit (A21076, Invitrogen, 1:10000), Alexafluor 790 goat anti-mouse (A11375, Invitrogen, 1:10000), 
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A21058, Invitrogen, 1:10000), Plexin D1 (E13, sc-46245, Santa Cruz Biotechnology), Plexin D1 


(PAS5-47012, ThermoFisher Scientific) 
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Cruz Biotechnology) used for magnetic force application experiments was validated for this application in-house. 
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The proteasome is a major proteolytic machine that regulates cellular proteostasis 


through selective degradation of ubiquitylated proteins’. Anumber of ubiquitin- 
related molecules have recently been found to be involved in the regulation of 
biomolecular condensates or membraneless organelles, which arise by liquid-liquid 
phase separation of specific biomolecules, including stress granules, nuclear speckles 
and autophagosomes? ®, but it remains unclear whether the proteasome also 
participates in such regulation. Here we reveal that proteasome-containing nuclear 
foci form under acute hyperosmotic stress. These foci are transient structures that 
contain ubiquitylated proteins, p97 (also known as valosin-containing protein (VCP)) 
and multiple proteasome-interacting proteins, which collectively constitute a 
proteolytic centre. The major substrates for degradation by these foci were ribosomal 
proteins that failed to properly assemble. Notably, the proteasome foci exhibited 
properties of liquid droplets. RAD23B, a substrate-shuttling factor for the 
proteasome, and ubiquitylated proteins were necessary for formation of proteasome 
foci. In mechanistic terms, a liquid-liquid phase separation was triggered by 
multivalent interactions of two ubiquitin-associated domains of RAD23B and 
ubiquitin chains consisting of four or more ubiquitin molecules. Collectively, our 
results suggest that ubiquitin-chain-dependent phase separation induces the 
formation ofa nuclear proteolytic compartment that promotes proteasomal 


degradation. 


To enable visualization of proteasomes in live cells, we generated 
derivatives of the HCT116 colon cancer cell line in which endogenous 
proteasome subunits, the core particle subunit proteasome subunit 
B type-2 (PSMB2, also known as £4), and regulatory particle subunit 
26S proteasome non-ATPase regulatory subunit 6 (PSMD6, also known 
as RPN7), were labelled with an eGFP or FusionRed fluorescent tag 
(Extended Data Fig. 1). Consistent with previous studies”, the protea- 
somes were primarily observed in the nucleoplasm and cytoplasm of 
highly proliferating cells. Over the course of a series of experiments, we 
unexpectedly observed that under hyperosmotic stress, proteasomes 
rapidly formed multiple foci in the nucleus (Fig. 1a, Supplementary 
Video 1). We confirmed that foci formed in wild-type (WT) HCT116 
cells using an endogenous antibody against the proteasome (Fig. 1a, 
Extended Data Fig. 2a). Various osmolytes, sucrose, glucose and NaCl 
stimulated foci formation, and the osmolarity required to induce 
the response was 100 mOsmol I”, close to the value observed during 
physiological changes associated with type Il diabetes" (Extended 
Data Fig. 2b). PSMD6-eGFP also formed foci, and the foci were stained 
with an activity-based probe, suggesting the involvement of active 26S 
proteasomes (Fig. 1a, Extended Data Fig. 2c). Consistently, asnapshot 


obtained by cryo-electron tomography revealed the clustering of 
26S proteasomes in the nucleus upon osmotic stimulation (Fig. 1b, 
Extended Data Fig. 2d). We also observed these foci in immortalized 
retinal pigment epithelial (RPE-1) cells and mouse embryonic stem 
(ES) cells, suggesting that hyperosmotic-stress-induced formation of 
proteasome foci is a universal phenomenon (Extended Data Fig. 2e). 


Proteasome foci are sites of proteolysis 


Several nuclear bodies are related to the proteasome, such as promyelo- 
cyticleukemia protein (PML) nuclear bodies and Cajal bodies” “, but these 
bodies did not colocalize with proteasome foci (Extended Data Fig. 3a, b). 
Instead, proteasome foci colocalized almost completely with lysine 48 
(K48)-linked ubiquitin chains, a major proteolytic signal for the protea- 
some, but not with non-proteolytic K63-linked chains (Fig. 1c, Extended 
Data Fig. 3c). To obtain further functional insights, we performed time- 
lapse imaging of PSMB2-eGFP cells (Fig. 1d). Five minutes after the addi- 
tion of 0.2 Msucrose, the number of fociincreased toamaximum of around 
30 per nucleus, and the diameter of the foci increased to approximately 
500 nm within 30 min. Subsequently, the foci gradually disappeared over 
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Fig. 1| Hyperosmotic stress induces the formation of proteasome foci inthe 
nucleus. a, HCT116 cells, PSMB2—eGFP or PSMB2-FusionRed cells were 
stimulated with 0.2 M sucrose. Endogenous proteasome activity was detected 
using a proteasome probe (Me4BodipyFL-Ahx3Leu3VS) and localization of 
endogenous proteasomes was determined using PSMA1 (a6) antibody at the 
indicated times. Scale bars, 10 pm. b, Cryo-electron tomography (cryo-ET) 
image of proteasome foci inthe nuclear region following stimulation with 

0.2 M sucrose. Scale bar, 0.1 1m. c, Left, PSMB2-eGFP cells were stimulated 
with 0.2 Msucrose for 30 min and endogenous K48Ub was detected with K48- 
ubiquitin antibody. Right, line profiling of a representative section of the cell, 
indicated by a white dashed line. Scale bars, 10 pm. d, Left, time-lapse images of 
live PSMB2-eGFP cells. Cells were treated with proteasome inhibitor MG-132 
(50 uM, Lhprior), b-AP15 (1M, simultaneously) or ubiquitin-activating 


the course of 3h. When the cells were treated with the proteasome inhibitor 
MG-132 or b-AP15, the number and size of the foci increased, and clearance 
was significantly delayed. By contrast, pre-treatment with the ubiqui- 
tin El inhibitor MLN-7243 almost completely inhibited foci formation 
(Fig. 1d, bottom, Extended Data Fig. 3d). Thus, ubiquitylated substrates 
are required for the formation of proteasome foci and proteasome activity 
is necessary for their clearance. Together, these observations suggested 
that hyperosmotic-stress-induced proteasome fociare sites of proteolysis. 


Orphan RP is degraded in proteasome foci 


A slight increase in the level of ubiquitylated proteins was observed 
following hyperosmotic stimulation, and amass spectrometry-based 


enzyme (E1) inhibitor MLN-7243 (1uM, 1h prior), and then stimulated with 0.2M 
sucrose. Scale bars, 10 pm. Representative images from two independent 
experiments. Right, changes of the proteasome foci number per cell and foci 
diameter in untreated control cells (grey) and MG-132- (magenta), b-AP15- 
(green) or MLN-7243- (orange) treated cells. n represents cell numbers 
(control: n=119, 121, 116, 117,120, 120,123, 126, 128,125,125; MG132:n=94, 99, 
99,95, 95, 91, 93, 92, 94,100 and 95; b-AP15: n= 81, 85, 80, 80, 85, 88, 90, 93, 103, 
93 and 102; MLN-7243: n=47, 43, 43, 42,42, 42,42, 40,39, 39 and 39, at 0, 1,3,5, 
10,30, 60, 90,120,150 and180 min after sucrose treatment, respectively). Data 
are mean +s.e.m.; one-way ANOVA with Dunnett's test (number) and 

median +s.e.m., Friedman with Dunn’s test (diameter). Pvalues are shown inthe 
figure. Representative images from four (a, left), three (a, middle and right) or 
two (b, d) independent experiments. 


ubiquitylome analysis identified changes in several housekeeping pro- 
teins, including linker histones, HSP90 and ribosomal proteins (RPs) 
(Extended Data Fig. 4a, b). We subsequently focused on RPs. Ribosomes 
are constitutively produced at around 7,500 molecules per minute 
in cultured human cells via complicated assembly processes within 
the nucleus, and orphan RPs that fail to incorporate into ribosomes 
are degraded by the ubiquitin-proteasome system’. Therefore, 
we hypothesized that ribosome biosynthesis might be vulnerable to 
hyperosmotic stress. Indeed, electron microscopy analysis showed 
the disappearance of the nucleolar dense fibrillar compartment (DFC), 
where pre-ribosomal RNA (rRNA) and RPs are assembled, upon hyper- 
tonicstress (Fig. 2a). We also observed the emergence of multiple hyper- 
dense structures near proteasome foci; these structures disappeared 
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Fig. 2| Ribosomal proteins are degraded in proteasome foci. a, Changes in 
the ultrastructural appearance of type 2 granule matrix in the nucleolus and 
nucleoplasm, as determined by transmission electron microscopy. Dense 
fibrillar compartment structures (arrow) inthe nucleolus disappeared and very 
dense granules (arrowhead) in the nucleoplasm were observed after 
stimulation of PSMB2-eGFP cells with 0.2 M sucrose. Nuc, nucleus. Scale bars, 
5m. Representative images from two independent experiments. b, PSMB2- 
FusionRed cells stably expressing eGFP-3xFlag-fused RPL29 (L29), RPL15 (L15) 
or RPS2 (S2) were stimulated with or without 0.2 M sucrose for 30 min, 
immunoprecipitated (IP) with Flag antibody and immunoblotted as indicated. 
Representative result from three independent experiments. For gel source 
data, see Supplementary Fig. 1.c, Colocalization of proteasome foci and 
endogenous ribosomal proteins. PSMB2-eGFP cells were stimulated with0.2M 
sucrose for 30 min, and endogenous ribosomal proteins were detected with 
specific antibodies. The mean value of the Pearson correlation coefficient in 
the nucleoplasm is shown in the image (n=10 cells in two fields of view). Scale 
bars, 10 pm. Each graph represents the normalized fluorescence distribution 
over the white dashed lines. Representative results from two independent 
experiments. See Extended Data Fig. 5a for additional ribosomal components. 
d, Time-lapse images of single foci in live HCT116 cells stably expressing 
PSMB2-FusionRed and eGFP-RPL29 (PSMB2-FusionRed/RPL29-eGFP). Cells 
were stimulated with 0.2 M sucrose, with or without pretreatment with the E1 
inhibitor MLN-7243 (1uM, Lh prior). Scale bars, 0.5 pm. Representative results 
from five (control) or three (MLN-7243) independent experiments. See also 
Supplementary Videos 2 and 3. 


ina proteasome activity-dependent manner after 4 h (Extended Data 
Fig. 4c, d). Northern blot analysis revealed that hypertonic stress caused 
areduction in the Pol I-transcribed pre-rRNA and its processed forms” 
(Extended Data Fig. 4e). In addition, stably expressed RPs in cells were 
constitutively ubiquitylated, and notably, ubiquitylation levels were 
further increased by sucrose treatment (Fig. 2b). Thus, hyperosmotic 
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stress induces acute nucleolar stress, thereby causing failure of pre- 
ribosome assembly. 

To determine whether RPs localize to proteasome foci, we performed 
immunofluorescence staining, and found that several RPs (RPL7A, 
RPL15, RPL29 and RPS2) modestly colocalized with proteasome foci 
(Fig. 2c, Extended Data Fig. 5a). In time-lapse imaging of the PSMB2- 
FusionRed cells stably expressing RPL29-eGFP, we observed a rapid 
emergence of RPL29 condensates and their degradation at proteasome 
foci (Fig. 2d, Extended Data Fig. 5b, Supplementary Video 2). RPL29 
structures were also observed in the presence of MLN-7243, but these 
were smaller, and their clearance was markedly delayed (Fig. 2d, Sup- 
plementary Video 3). Treatment with the Pol I inhibitor CX-5461 or 
transient overexpression of RPs further increased the size of protea- 
some foci, suggesting that the condensates formed from unassembled 
orphan RPs produced in the nucleoplasm (Extended Data Fig. 5c, d). 


Proteasome foci are liquid droplets 


Time-lapse imaging of the PSMB2-FusionRed cells stably expressing 
eGFP-ubiquitin revealed that proteasome foci have liquid droplet-like 
properties: small foci that contained proteasomes and ubiquitin sus- 
pended inthe nucleoplasm fused into larger foci, and their circularity 
was 0.998 (Fig. 3a, Supplementary Video 4). The number of preformed 
foci decreased upon addition of 1,6-hexanediol, an aliphatic alcohol 
that destabilizes liquid droplets, but not by ammonium acetate, which 
disrupts RNA gelation”°” (Fig. 3b). Fluorescence recovery after pho- 
tobleaching (FRAP) experiments revealed that approximately 90% of 
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separation of the proteasome. a, p97, RAD23B and UBE3A localized to 
proteasome foci. PSMB2-eGFP cells were stimulated with 0.2 M sucrose and 
observed by immunofluorescence with indicated antibodies. Scale bars, 

10 um. b, PSMB2-eGFP cells treated with p97 inhibitor NMS-873 (1uM, 
1hprior), PSMB2-eGFP cells lacking RAD23B or UBE3A were stimulated with 
0.2M sucrose for 30 min. The graph indicates the number of proteasome foci 
per cell and the diameter of individual foci. n represents cell numbers (control, 
176 cells; NMS-873, 168 cells; RAD23B-KO, 112 cells; UBE3A-KO, 108 cells). Data 
are meants.d., ****P< 0.0001, ***P< 0.0002, **P< 0.0021 by Kruskal-Wallis 
with Dunn’s test. c, RAD23B-KO PSMB2-eGFP (PSMB2-eGFP/RAD23B-KO) cells 
were stimulated with 0.2 M sucrose for 30 min, and endogenous K48Ub was 
detected with K48-ubiquitin antibody. Scale bars, 10 1m. d, Top, liquid droplets 
formed 90 min after mixing of 20 1M Cy5-K48Ub chains with 20 pM Cy3- 


proteasomes were rapidly exchanged into and out of the foci, further 
supporting the idea that proteasome foci are liquid droplets (Fig. 3c). 


p97 and RAD23B regulate proteasome foci 


A mass-spectrometry-based proteomic screen for proteasome- 
interacting proteins identified ubiquitin-selective chaperone p97 
(also known as VCP), substrate-shuttling factor RAD23B and ubiq- 
uitin ligase UBE3A (also known as E6-AP)! (Source Data of Fig. 4a). 
These proteins extensively colocalized with the proteasome at the 
foci (Fig. 4a, Extended Data Fig. 6a, b). Inhibition of p97 activity by 
NMS-873 caused an increase of around 50% in foci size, suggesting a 
positive role of p97 for degradation of ubiquitylated substrates in pro- 
teasome foci (Fig. 4b). Indeed, NMS-873, like b-AP15, increased the size 
of RPL29 condensates (Extended Data Fig. 7, Supplementary Videos 5, 
6). Knockout (KO) of UBE3A caused a reduction of around 30% in the 
number of foci, suggesting that UBE3A regulates the proteasome itself 
or ubiquitylation of substrate proteins in the foci” (Fig. 4b, Extended 
Data Fig. 8a). Notably, formation of proteasome foci was markedly 
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with different lengths. Red dots indicate phase separation and blue dots 
indicate no phase separation. g, Current model of formation and clearance of 
proteasome-containing liquid droplets. Ina, c-e, representative results from 
two independent experiments. 


attenuated in RAD23B-KO cells. No significant effects were observed 
on siRNA-mediated knockdown of ubiquitin-like proteins (UBQLNs) 
and RAD23A, other shuttling factors, or XPC, which functions with 
RAD23 proteins in the DNA nucleotide excision repair pathway?” 
(Extended Data Fig. 6c). 

We initially predicted that ubiquitylated proteins would form foci 
before the RAD23B-dependent recruitment of proteasomes. How- 
ever, we found that neither the proteasome nor ubiquitin formed 
the foci in RAD23B-KO cells (Fig. 4c, Extended Data Fig. 8b). This 
was not due to a decrease in the level of ubiquitylated proteins 
(Extended Data Fig. 8a). RAD23 family proteins have a ubiquitin- 
like (UBL) domain and two ubiquitin-associated (UBA) domains that 
bind the proteasome and ubiquitin chains, respectively’™*. Adding 
back mutant RAD23B to RAD23B-KO cells revealed that formation of 
ubiquitin-positive foci was UBA domain-dependent (Extended Data 
Fig. 8b). Because overexpression of RAD23A rescued RAD23B-KO 
cells, the difference in ability to form foci can be explained by dif- 
ferences in the expression levels of the two proteins”>”° (Extended 
Data Fig. 8c, d). 
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RAD23B drives LLPS of ubiquitin chains 


These results raised the possibility that RAD23B is directly involved 
in liquid-liquid phase separation (LLPS) of ubiquitylated proteins. To 
determine this in vitro, we incubated fluorescently labelled RAD23B 
and K48-linked polyubiquitin (K48Ub) chains in the presence of poly- 
ethylene glycol (PEG), used as a crowding reagent (Extended Data 
Fig. 9a—c). On mixing, RAD23B and K48Ub chains formed spherical 
condensates in which the two different fluorescent signals were uni- 
formly distributed (Fig. 4d). FRAP analysis revealed rapid exchanges 
of both RAD23B and K48Ub chains, suggesting co-phase separation 
of RAD23B and K48Ub chains (Extended Data Fig. 9d). Consistent 
with the properties of liquid droplets, small condensates fused into 
larger ones of upto micrometre size (Fig. 4e, Supplementary Video 7). 
RAD23B lacking the UBL domain caused formation of amorphous 
protein aggregates, whereas RAD23B lacking the UBA domains did 
not form condensates at all (Fig. 4d). As RAD23B prefers K48Ub chains 
with four or more ubiquitin molecules”*”’, co-phase separation of 
RAD23B/K48Ub chains was dependent not only on the concentra- 
tion of each protein but also on the length of K48Ub chains (Fig. 4f, 
Extended Data Fig. 9e). Although long K63-linked ubiquitin chains 
could form condensates with RAD23B, the efficiency was lower than 
that of K48Ub chains (Extended Data Fig. 9e). Thus, multivalent inter- 
actions between long K48Ub chains and two UBA domains of RAD23B 
drive liquid-liquid phase separation. 


Discussion 


In this study, we identified a proteasome-containing structure that is 
induced by hyperosmotic stress. The fluid organization arises from 
LLPS of ubiquitylated proteins and RAD23B, followed by proteasome 
recruitment. Proteasome condensates were prominently observed in 
the nucleoplasm, probably because hyperosmotic stress results in a 
further increase inthe nuclear concentration of proteasomes, RAD23B 
and ubiquitylated substrates, and in particular, ubiquitylated orphan 
RPs, owing to nucleolar stress (Fig. 4g). Although its functional impor- 
tance is not fully understood, the condensation appears to facilitate 
proteasomal degradation, because ribosomal condensates were sta- 
bilized by inhibition of the proteasome or p97 (Extended Data Fig. 7). 
Moreover, in RAD23B-KO cells, as in cells treated with E1 inhibitor, 
small amorphous structures of RPL29 were observed, suggesting that 
condensation of ubiquitylated proteins might protect against protein 
aggregation (Extended Data Fig. 7). Given that unassembled RPs stimu- 
late p53 activation”’, failure of ribosomal condensate formation might 
cause apoptosis. Indeed, RAD23B-KO cells underwent apoptosis in 
response to mild hyperosmotic stress (Extended Data Fig. 8e, f). Con- 
versely, recent studies showed a conversion from liquid-like droplets 
to solid-like assemblies of aggregation prone proteins, most of which 
the proteasome can degrade only in reversible aggregated forms**”?”°. 
In this context, acute hyperosmotic stress may risk irreversible accu- 
mulation of protein aggregates, especially when the proteasome or 
p97 activity is reduced. 

It remains unclear whether multivalent interactions between 
ubiquitin chains and ubiquitin-binding proteins universally induce 
LLPS in cells. Given that cells contain numerous ubiquitin-binding 
proteins that regulate multiple cellular pathways, and in light of 
the profound functional consequences of biomolecular condensa- 
tion, it will be of great interest to investigate their ability to pro- 
mote LLPS of ubiquitylated proteins as well as their physiological 
consequences. 
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Methods 


No statistical methods were used to predetermine sample size. The 
experiments were not randomized. The investigators were not blinded 
to allocation during experiments and outcome assessment. 


Cell culture 

HCT116 cells (human colorectal carcinoma, CCL-247) and hTERT RPE-1 
cells (human telomerase-immortalized retinal pigmented epithelial, 
CRL-4000), both obtained from ATCC (Manassas) were maintained at 
37 °C in Dulbecco’s modified Eagle’s medium (Sigma) supplemented 
with 10% (v/v) fetal bovine serum (FBS) (Biowest), 1 mM sodium pyru- 
vate (Thermo Fisher Scientific), and nonessential amino acids (Thermo 
Fisher Scientific) in an incubator with a 5% CO, atmosphere. E14TG2a 
mouse ES cells purchased from ATCC (CRL-1821) were cultured in Dul- 
becco’s modified Eagle’s medium (Sigma) supplemented with 10% (v/v) 
Embryonic Stem-Cell FBS (Thermo Fisher Scientific), 1 mM sodium 
pyruvate, nonessential amino acids, 0.1 mM 2-ME (Sigma), 15% (v/v) 
FBS-free medium supplement (Thermo Fisher Scientific), and 10? U mI 
unit ESGRO Leukaemia Inhibitory Factor (Merck). 


Inhibitors and reagents 

Formation of proteasome foci was induced by NaCl (Wako), 
sucrose (Wako) or glucose (Wako) at the indicated concentrations. 
Me4BodipyFL-Ahx3Leu3VS fluorescent proteasome probe” (UbiQ 
Bio) was used to detect endogenous proteasome activity. PSMB2- 
FusionRed*“' cells were treated with 1 1M Me4BodipyFL-Ahx3Leu3VS 
for 1h and washed with PBS, after which formation of proteasome 
foci was induced with 0.2 M sucrose. Ammonium acetate (Wako) and 
1,6-hexanediol (Sigma) were used to distinguish liquid-like and solid- 
like states of proteasome foci in living cells. Hexanediol (1%) and 0.1M 
ammonium acetate were added 2 min after induction of foci formation 
by 0.2 Msucrose in PSMB2-eGFP*"*' cells. Because hexanediol can cause 
various deleterious effects on cells, treatment with this reagent was 
minimized. The inhibitors used in this study were as follows: bort- 
ezomib (LC Laboratories), MG-132 (Peptide Institute), b-AP15* (LifeSen- 
sors), MLN-7243*4 (Active Biochem), NMS-873* (Selleck Chemicals), 
DBeQ (Sigma) and CX-5461 (ChemScene). 


MTS assay for cell viability 

Cells treated with different concentration of DBeQ for 4 h, and cell 
viability was analysed by CellTiter 96 AQueous One Solution Cell Pro- 
liferation Assay kit (Promega). Absorbance at 490 nm was measured 
onan EnSpire 2300 multimode plate reader (PerkinElmer). 


Plasmids and plasmid transfection 

Transfections of HCT116 cells with the indicated plasmids were per- 
formed using Lipofectamine 2000 (Thermo Fisher Scientific). Human 
RPL7A and RPS2 were cloned into pcDNA3-eGFP (Addgene 13031) witha 
GGGS linker sequence and 3xFlag to yield pCDNA3-RPL7A-eGFP-3xFlag 
and pcDNA3-RPS2-eGFP-3xFlag, respectively. To generate pcDNA3- 
RAD23A-FusionRed-3xFlag, human RAD23A was cloned with a GGGS 
linker sequence and 3xFlag into pcDNA3-eGFP in which the eGFP gene 
was replaced with FusionRed (Evrogen, JSC). To construct donor vec- 
tors for TALEN knock-in (KI), ~1,000 bp upstream of the target site, 
including the last exon without the stop codon, alinker sequence (GGGS 
linker), eGFP or FusionRed, 3xFlag tag, SV40 polyA, selection marker, 
and ~1,000 bp of downstream sequence were tandemly flanked by PCR 
and inserted into pBlueScript (Stratagene). 


RNA interference 

siRNA oligos were obtained from ON-TARGETplus SMARTpool (Dhar- 
macon). The target sequences are as follows: RAD23A#1, GCTCTGAGT 
ATGAGACGAT; RAD23A#2, GAAGATAGAAGCTGAGAAG; RAD23A#3, 
GATCT TGAGTGACGATGTC; RAD23A#4, GAAGAACTTTGTGGTCGTC; 


RAD23B#1, GCAGATAGGTCGAGAGAAT; RAD23B#2, GAACGAGA 
GCAAGTAATTG; RAD23B#3, GAAGTGGTCATATGAACTA; RAD23B#4, 
CAACAACCCTGACAGAGCA; UBQLNI1#1, CATCAACTCCTAATAGTAA; 
UBQLN1#2, GIACTACTGCGCCAAATTT; UBQLN1#3, AGACAAACGTT 
GGAACTTG; UBQLN1#4, CTGAGTAGCT TGGGTT TGA; UBQLN2#1, 
TAAGGAAGCGATTTCGAAA; UBQLN2#2, CTGAATAGCCCGCTGTTTA; 
UBQLN2#3, CCAAACCGATCAGCTAGTG; UBQLN2#4, GACATTAG 
CCACTGAAGCA; UBQLN4#1, GCTGAGAATATGACGGCAA; UBQLN4#2, 
CCAATGAAGCTAAGCGCCA; UBQLN4#3, GAGGAGGGAATGAGATTAT; 
UBQLN4#4, CCAACCAATGCTAGAAT TT; PML#1, GGAAAGATGCAGC 
TGTATC; PML#2, GAGCTCAAGTGCGACATCA; PML#3, GGACA 
TGCACGGTTTCCTG; PML#4, GCAACCAGTCGGTGCGTGA; XPC#1, 
GCAAATGGCTTCTATCGAA; XPC#2, TGAAATATGAGGCCATCTA; XPC#3, 
GAGAAGTACCCTACAAGAT; XPC#4, GGAGGGCGATGAAACGTTT. For 
non-targeting control siRNA, we used ON-TARGETplus non-targeted 
siRNA#4 (Dharmacon). siRNAs were transfected into cells using Lipo- 
fectamine RNAiMAX (Thermo Fisher Scientific). After 24 h of transfec- 
tion, the medium was replaced, and the cells were grown for an addition 
24 h before analysis. 


Generation of KI cell lines 

PSMB2-eGFP-3xFlag- or PSMB2-FusionRed-KI HCT116 cells were gen- 
erated by TALEN KI using the Platinum Gate TALEN Kit, a gift from T. 
Yamamoto (Addgene# 1000000043)**. TALEN-targeting sequences 
(PSMB2-1: TTTCCT TCCCCAAACAGGGCT; PSMB2-2: TTCCCTGGCA 
AGTGGGAGGGA) at the last exon of PSMB2 were cloned into ptCMV- 
153/47. The assembled TALEN plasmids and a donor vector pcDNA3- 
PSMB2-linker-eGFP-3xFlag (Puro) or pcDNA3-PSMB2-linker-FusionRed 
(Puro) were co-transfected into HCT116 cells. Puromycin-resistant 
clones were isolated and validated by western blot and DNA sequencing. 
PSMD6-eGFP-3xFlag-KI HCT116 cells were generated using the TALE 
nuclease (TALEN) Kit’, a gift from F. Zhang (Addgene 1000000019). 
TALEN-targeting sequences (PSMD6-1: TCCAGAGTAAT TAATATGTA; 
PSMD6-2: TCCAGAGTAATTAATATGTA) at the last exon of PSMD6 
were cloned into pTALEN_v2. The assembled TALEN plasmids and the 
donor vector, pBlueScript-PSMD6-linker-eGFP-3xFlag (Puro), were 
co-transfected into HCT116 cells, and puromycin-resistant clones were 
isolated and validated. eGFP-ubiquitin, RPL29-eGFP, RPL15-eGFP or 
RPS2-eGFP was knocked in to the AAVS1 locus of HCT116 cells using 
the TALEN Kit using TALEN-targeting sequences AAVS1-1 (TGTCC 
CCTCCACCCCACA) and AAVS1-2 (TTTCTGTCACCAATCCTG). The 
TALEN plasmids and a donor vector pBlueScript-EF1lpr-EGFP- 
ubiquitin (Neo), pBlueScript-EFlpr-RPL15-linker-eGFP-3xFlag (Neo), 
pBlueScript-EFlpr-RPL29-linker-eGFP-3xFlag (Neo) or pBlueScript- 
EF1pr-RPS2-linker-eGFP-3xFlag (Neo) were co-transfected into HCT116 
cells. G418-resistant clones were isolated and validated by immunoblot 
and DNA sequencing. 


Generation of UBE3A-KO and RAD23B-KO cells 

ACRISPR guide sequence targeting exon 3 of UBE3A or exon5 of RAD23B 
was cloned into pSpCas9(BB)-2A-Puro (PX459) v.2.0°°, a gift from F. 
Zhang (Addgene 62988). Sequences were as follows: UBE3A-1: CA 
CCGCT TACCT TGAGAACTCGAA; UBE3A-2: AAACTTCGAGTTCTCA 
AGGTAAGC; RAD23B-1: CACCGAAGATGCAACGAGTGCACT; RAD23B-2: 
AAACAGTGCACTCGTTGCATCTTC. Puromycin-resistant clones were 
isolated and validated by western blotting and DNA sequencing. 


Generation of RAD23B add-back cell lines 

RAD23B add-back cell lines were generated using a retroviral system. 
For expression of human RAD23B, RAD23B mutants, L8A and L225A/ 
L401A, were generated by PCR mutagenesis using the QuikChange 
mutagenesis kit (Agilent Technologies) and cloned into pMX-Puro rretro- 
viral expression vector (Cell Biolabs) to yield pMX-puro-RAD23B(WT)- 
FusionRed-3xFlag, pMX-puro-RAD23B(L8A)-FusionRed-3xFlag, and 
pMX-puro-RAD23B(L225A/L401A)-FusionRed-3~Flag, respectively. 
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Virus particles were produced in HEK293T cells co-transfected with 
Gag-Pol, VSV-G, and RAD23B retrovirus plasmids in six-well plates. 
After 12h of transfection, the medium was replaced and the cells were 
cultivated for an additional 24 h. Viral supernatants were then used to 
infect RAD23B-KO cells. Puromycin-resistant clones were isolated and 
validated by western blotting. 


Microscopy 

Live-cell imaging experiments were performed ona CV1000 automated 
spinning-disk microscope (Yokogawa Electric Corporation) equipped 
with a UPLSApo60 x 0.1.35NA (Olympus), or on an 1X73 inverted fluo- 
rescence microscope (Olympus) equipped with an enhanced CSU-X1 
spinning disk (Microlens-enhanced dual Nipkow disk confocal scan- 
ner) (Yokogawa), a PlanApo 100 x OTIRF 1.45NA (Olympus), and an 
Andor Neo sCMOS camera (Andor) or an ORCA-Flash4.0 V3 Digital 
CMOS camera (Hamamatsu Photonics). All cells were maintained in 
ahumidified environment at 37 °C under 5% CO,. Three-dimensional- 
imaging experiments were performed ona Leica TCS SP8 laser-scanning 
microscope (Leica Microsystems) with a HC PL APO 100~/1.40 NA oil 
CS2 (Leica). In vitro droplet formation of protein samples was moni- 
tored ona Fluoview FV3000 confocal laser scanning microscope with 
anIX83 fully motorized inverted microscope and a UAPON 100 OTIRF 
1.49 NA (Olympus). 


Immunofluorescence and time-lapse imaging 

Cells were initially plated in 35-mm glass-bottomed dishes (MatTek) 
coated with poly-L-lysine. For immunofluorescence, cells were fixed 
in 4% paraformaldehyde (Thermo Fisher Scientific) in PBS for 15 min, 
and then permeabilized with MeOH (Wako) for 5 min at -20 °C. Before 
antibody incubation, cells were blocked with 1% FBS in PBS. The primary 
antibodies used were as follows: anti-multi-ubiquitin mouse mono- 
clonal FK2 (NBT-MFKOO3; Nippon Bio-Test Laboratories), anti-multi- 
ubiquitin rabbit polyclonal (Z0458; Dako), Lys48-Specific anti-ubiquitin 
rabbit monoclonal (05-1307; clone Apu2; Millipore), or Lys63-Specific 
anti-ubiquitin rabbit monoclonal (05-1308; clone Apu3; Millipore); 
anti-PSMAI mouse monoclonal antibody”; anti-RAD23B rabbit mono- 
clonal (13525; Cell Signaling Technology); anti-VCP mouse monoclonal 
(ab11433; Abcam); anti-UBE3A rabbit polyclonal (10344-1-AP; Protein 
Tech); anti-RPS2 rabbit polyclonal (ab155961; Abcam); anti-RPS6 rab- 
bit monoclonal (2217; Cell Signaling Technology); anti-RPS9 rabbit 
polyclonal (18215-1-AP; Protein Tech); anti-RPL4 mouse monoclonal 
(sc-100838; Santa Cruz Biotechnology); anti-RPL7A rabbit polyclonal 
(2415; Cell Signaling Technology); anti-RPL15 rabbit polyclonal (16740- 
1-AP; Protein Tech); anti-RPL29 mouse polyclonal (HO0006159-BO1P; 
Abnova); anti-RPL35 rabbit polyclonal (SAB4500233; Sigma-Aldrich); 
anti-rRNA mouse polyclonal (sc-33678; Santa Cruz Biotechnology); 
anti-PML mouse monoclonal (sc-966; Santa Cruz Biotechnology); anti- 
Coilin mouse monoclonal (ab11822; Abcam); anti-BMI1 rabbit mono- 
clonal (6964; Cell Signaling Technology); anti-SC35 mouse monoclonal 
(ab11826; Abcam); anti-CENPC guinea pig polyclonal (PDO30; MBL, 
Medical & Biological Laboratories); and anti-phospho-Histone H2A.X 
(Ser139) mouse monoclonal (05-636; Millipore). The following second- 
ary antibodies were purchased from Thermo Fisher Scientific: Alexa 
Fluor 488-conjugated anti-mouse (A-11029), anti-rabbit (A-11036); Alexa 
Fluor 568-conjugated anti-mouse (A-11031), anti-rabbit (A-11036), and 
anti-guinea pig (A-11075); and Alexa Fluor 647-conjugated anti-mouse 
(A-21236), and anti-rabbit (A-21245). Antibodies were diluted in PBS 
with 0.1% FBS. Samples were incubated with antibodies for 1h at room 
temperature. After incubation, cells were treated with DAPI (Thermo 
Fisher Scientific) for 15 min and coverslipped (Matsunami) with Slow- 
Fade Gold (Thermo Fisher Scientific). For time-lapse experiments, the 
medium was replaced with Phenol red-free D-MEM/F12 (Thermo Fisher 
Scientific) supplemented with 10% FBS. Cells were incubated for 1h, 
transferred to anincubator microscope (described above), maintained 
at 37 °C in 5% CO,, and imaged for 1to 3h. 


Data processing of microscopy images 

All image analysis was performed using the Metamorph software 
(Molecular Devices). For quantification of proteasome foci, maxi- 
mum projections of 16 z-stack images (0.2 um apart) were manually 
segmented with the nucleus (identified by DAPI staining), and foci 
number and diameter were analysed using Transfluor. For quantifica- 
tion of proteasome foci in time-lapse imaging analysis, the diameter 
was calculated from the average foci area per cell, and the median per 
view field was averaged. To generate 3D images, we took 30 optical 
sections spaced 0.06 pm apart. The image view of the 3D point (xy, 
yz, XZ) was given by projective transformation. For quantification of 
circularity of proteasome foci, images were processed through the 
close-open filter and analysed by integrated morphometric analysis. 
Circularity was calculated according to the following formula: circular- 
ity = 411(area/perimeter’). For quantification of RPL29 condensates, a 
maximum projection of three z-stacks (0.2 um apart) was manually 
segmented with the nucleoplasm, and the number of foci per cell and 
the diameter were analysed using integrated morphometric analysis. 
The Pearson correlation coefficient (r) was calculated from a scatter 
plot of the fluorescence intensities of two proteins. 


FRAP analysis 

In-cell FRAP analysis was performed using a Leica TCS SP8 laser- 
scanning microscope equipped with a HC PL APO 100x/1.40 NA oil 
CS2. Intracellular assemblies were bleached ina circular 1-ym’ region 
of interest using a 3.25-s pulse of the 405-nm laser line at full power. 
Recovery was monitored every 0.65 for 400 frames. In vitro FRAP was 
performed using an Olympus Fluoview FV3000 equipped with a UAPON 
100 OTIRF 1.49 NA (Olympus). Droplet assemblies were bleached ina 
circular 0.4-~m” region of interest using a 0.5-s pulse of the 488-nm laser 
line at full power. Recovery was monitored every 0.5 s for 180 frames. 
Recovery curves were analysed using Metamorph. Plotting and curve 
fitting were carried out in GraphPad Prism 7 (Graphpad Software). 


Transmission electron microscopy 

Samples were fixed with 2% paraformaldehyde and 2% glutaraldehyde 
in0.1M phosphate buffer (pH 7.4) at 37 °C for 30 min, and then at 4 °C 
for 30 min. Afterwards, they were fixed with 2% glutaraldehyde 0.1M 
phosphate buffer (pH 7.4) at 4 °C overnight. The cells were then washed 
with the same buffer and post-fixed with 2% osmium tetroxide (OsO,) 
inthesame buffer for 1h. The samples were then dehydrated for 1hina 
graded ethanol solution and embedded ina resin (Quetol-812; Nisshin 
EM) for 2 days, and polymerized at 60 °C for 48 h. Ultrathin sections 
(70-nm thickness) were made with a diamond knife on a Leica Ultra- 
cut UCT ultramicrotome (Leica), and then mounted on copper grids. 
Sections were stained with 2% uranyl acetate and lead stain solution 
(Merck) and visualized onJEM-1400Plus electron microscope (JEOL) at 
an acceleration voltage of 100 kV. Digital images (3,296 x 2,472 pixels) 
were acquired with an EM-14830RUBY2 camera (JEOL). 


Correlative light electron microscopy 

Image characteristics (pixel size, lateral resolution, and axial resolu- 
tion) of fluorescence emission and electron density were merged using 
ec-CLEM onthe open-source software platform Icy* (Institut Pasteur). 
For alignment, this algorithm relies on manual identification of match- 
ing landmarks inthe cell in single-section images, and it operates with 
an accuracy determined by the degree of sample distortion caused by 
the intermediate sample preparation step. 


Immunoprecipitation 

Cells were collected and lysed in buffer A (SO mM Tris-HCl, pH 7.5, 100 mM 
NaCl, 10% glycerol, 10 mM iodoacetamide) containing 0.2% NP-40 
and complete protease inhibitor cocktail (Roche, EDTA-free). After 
standing on ice for 30 min the lysate was sonicated on a Handy Sonic 


(Tomy Seiko), and then collected by centrifugation (20,000g for 15 min 
at 4 °C). The protein concentration was determined using the BCA pro- 
tein assay kit (Thermo Fisher Scientific). For anti-Flag immunoprecipi- 
tation, anti: DDDDK-tag mAb-Magnetic beads (Medical and Biological 
Laboratories), M185-11) were used to precipitate Flag-tagged protein 
complex from1mg of cell lysate by incubating for 1h at 4 °C. After five 
washes with buffer A containing 0.2% NP-40, the proteins were eluted 
for 10 min at 70 °C in 1x NUPAGE LDS sample buffer. 


SDS-PAGE and immunoblotting 

Cells were lysed with buffer A with complete protease inhibitor cocktail 
(EDTA free) (Roche) for 30 min onice, and then centrifuged at 12,000g 
for 5 minat 4 °C. Supernatants were collected, and protein concentra- 
tion was measured using the BCA protein assay kit (Thermo Fisher 
Scientific). Cell lysates were boiled in 1x LDS NuUPAGE sample buffer 
for 10 min, and then electrophoresed on 4-12% NuPAGE Bis-Tris gels 
(Thermo Fisher Scientific). Proteins were transferred to polyvinylidene 
difluoride membranes (Millipore). The membranes were blocked for 
30 minin5% non-fat milk, and then incubated 1h at room temperature 
with primary antibodies. The primary antibodies used were as follows: 
anti-Flag mouse monoclonal (A8592; Sigma-Aldrich); anti-multi-ubiq- 
uitin rabbit polyclonal (Z0458; Dako); anti-PSMB2 rabbit polyclonal 
(BML-PW8890; Enzo Life Sciences); anti-PSMD6 rabbit polyclonal 
(BML-PW8225; Enzo Life Sciences); Lys48-Specific anti-ubiquitin rab- 
bit monoclonal (05-1307; clone Apu2; Millipore); anti-RAD23B rabbit 
monoclonal (13525; Cell Signaling Technology); anti-UBE3A rabbit 
polyclonal (10344-1-AP; Protein Tech); anti-cleaved caspase-3 (Asp175) 
rabbit polyclonal (9661; Cell Signaling Technology); and anti-B-actin 
pAb-HRP-DirecT (PD030; MBL). After extensive washing with TBST, 
the membranes were immunoblotted with secondary antibodies for 
30 min at room temperature. The following secondary antibodies 
were purchased from Jackson ImmunoResearch Laboratories: HRP- 
conjugated goat anti-rabbit Ig and HRP-conjugated goat anti-mouse 
Ig. After washing several times with TBST, blots were developed using 
ECL Prime Western Blotting Detection Reagent (GE Healthcare) and 
analysed on an ImageQuant LAS4000 (GE Healthcare). 


Northern blotting 

Total RNA was isolated with TRIzol reagent (Thermo Fisher Scientific). 
Total RNA (4 pg per lane) was resolved ona1.2% formaldehyde agarose 
denaturing gel in1x TT (30 mM Tricine and 30 mM triethanolamine) 
buffer by electrophoresis for 120 min at 200 V, followed by capillary 
transfer to Hybond-N+ membrane (GE Healthcare) with 20x SSC (3M 
NaCl and 300 mM trisodium citrate dihydrate) for 18 h. After transfer, 
RNA was UV cross-linked to the membrane using a CL-1000 ultraviolet 
crosslinker (UVP) at 120 mJ cm”. Membranes were pre-hybridized with 
DIG Easy Hyb Granules (Roche) dissolved in double-distilled water 
(hybridization buffer), for 1 hin a hybridization oven at 50 °C. DIG- 
labelled DNA probes were added to the buffer and incubated for 20 h. 
Membranes were washed with non-stringent buffer (2x SSC and 0.1% 
SDS) for 15 min at 50 °C, followed by stringent buffer (0.1x SSC and 0.1% 
SDS) twice for 15 min each. The membrane was then washed with 1x MA 
buffer (100 mM maleic acid and 150 mM NaCl, pH 7.0) for 10 min at room 
temperature, and incubated with Blocking Reagent (Roche) for 30 min. 
Anti-digoxigenin—-AP, Fab fragments (Roche) was added to the blocking 
reagent, and further incubated for 1h. After that, the membranes were 
washed three times for 10 min each with 1x MA buffer containing 0.3% 
Tween-20, and then equilibrated in buffer-A (100 mM Tris-HCl and 100 
mM NaCl, pH 9.5). To detect RNA, the membranes were incubated with 
CDP-star (Roche) for 10 min, and then chemiluminescence was visual- 
ized ona LAS-4000 mini (GE healthcare). DNA probes were labelled 
with DIG at the 5’ end; probe sequences were as follows: 5’ETS; 5’-CGGA 
GGCCCAACCTCTCCGACGACAGGTCGCCAGAGGACAGCGTGTCAGC:3’; 
ITS1; 5’-CCTCGCCCTCCGGGCTCCGGGCTCCGTTAATGATC-:3’; and ITS2; 
5’-CTGCGAGGGAACCCCCAGCCGCGCA-3’. 


Purification of recombinant proteins 

Codon-optimized human RAD23B or RAD23A cDNA (Eurofins Genom- 
ics) was subcloned into Escherichia coli expression vector pGEX6P1 (GE 
Healthcare), in which the Cys sequence (TGC) was inserted upstream 
of the BamHI site for Cy labelling. RAD23B mutants, the UBL domain 
(A1-79), or two UBA domains (A180-238/A353-409), were generated by 
inverted PCR. Human ubiquitin with a N-terminal Met-Cys was inserted 
into vector pET26b (Novagen). GST-Cys-RAD23 proteins or Met- 
Cys-ubiquitin were expressed in F. coliBL21 (DE3) cells by induction with 
0.2 mM IPTGat 28 °C for 3 h. GST-tagged proteins were adsorbed onto 
glutathione-Sepharose 4B (GE Healthcare) in buffer B (SO mM HEPES- 
KOH, pH 7.5, 100 mM NaCl and 10% glycerol) containing 0.1% Triton 
X-100. After washing with the same buffer, the beads were incubated 
with PreScission protease (GE Healthcare) to cleave the GST-tag from 
GST-Cys-RAD23 proteins, and the Cys-RAD23 proteins were recov- 
ered. Cells expressing Met-Cys-Ubiquitin were suspended in50 mM 
HEPES-KOH, pH 7.5 and 100 mM NaCl and lysed by sonication. After 
centrifugation at 20,000g for 30 min, the supernatant was recovered 
and boiled at 80 °C for 5 min. The lysate was further clarified by cen- 
trifugation (20,000g for 20 min) to obtain Met-Cys-Ubiquitin. Cy3 or 
CyS labelling was performed using Cy Maleimide Mono reactive dye (GE 
Healthcare). After Cy labelling, CyS—ubiquitin and Cy3-RAD23 proteins 
were separated from free dye on a PD-10 column (GE Healthcare). Unan- 
chored K48-linked ubiquitin chains and K63-linked ubiquitin chains 
were prepared using UBE2K/E2-25K and Ubc13/Mms2, respectively, 
as previously reported“. To obtain Cy5-labelled ubiquitin chains with 
defined lengths, we further purified the ubiquitin chains by Superdex 
75 10/300 GL size exclusion chromatography (GE Healthcare). 


In vitro phase-separation assay 

Droplet formation of the purified protein was monitored by fluores- 
cence and differential interference contrast (DIC) microscopy using 
an Olympus Fluoview FV3000 (described above). Unless otherwise 
noted, 20 pM Cy5-ubiquitin and 20 pM Cy3-RAD23 proteins were 
incubated at room temperature in 50 mM Tris-HCl (pH 7.5), 3% PEG 
(Sigma-Aldrich) and 200 mM NaCl. The mixture in a total solution 
volume of 10 ul was placed in a glass-bottom dish (MatTek) or Senso- 
plate glass-bottom 384-well plate (Greiner Bio-One) coated with 0.1% 
PVA (Sigma-Aldrich). 


Phase diagram mapping 

After mixing RAD23B and ubiquitin proteins as described above, 
the mixture was placed on Sensoplate glass-bottom 384 well plates 
(Greiner) coated with 0.1% PVA (Sigma-Aldrich) and incubated for 
90 min at roomtemperature. The presence or absence of droplets was 
scored by Cy3 and CyS fluorescence in each well at 100x magnification 
using an Olympus Fluoview FV3000. For fluorescence observation, the 
maximum projection of three z-stacks from the bottom of the plate 
was processed, and the scale was set in the range from 0 to 4,095. Two 
wells were prepared per experiment, and if fluorescence was recorded 
at three or more areas in each well, it was regarded as a droplet. The 
experiment was performed twice. 


Quantification of RAD23B and RAD23A in HCT116 cells 

Cellular abundance of RAD23A and RAD23B were determined by quan- 
titative western blotting using recombinant protein as a standard. 
After washing twice with PBS, HCT116 cells were lysed by sonication 
in 2% SDS, 20 mM HEPES-Na (pH 7.5) and 1 mM EDTA. Proteins were 
subjected to western blotting with anti-RAD23A antibody (24555; Cell 
Signaling Technology) or anti-RAD23B antibody (13525; Cell Sign- 
aling Technology). Copy number per cell and concentration were 
calculated, assuming that total protein per cell was approximately 
200 pg and that cell volume was approximately 1 pl, as described 
previously”*”, 
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Propidium iodide exclusion assay and measurement of 
caspase-3 activity 

Cell death under hyperosmotic stress was quantified by propidium 
iodide (PI) exclusion. Immediately after washing with PBS, PI (Biovi- 
sion) was added to the cells at a final concentration of 50 pg ml and 
incubated at room temperature for 15 min, after which excess PI was 
washed off with PBS. At least two random fields were imaged using 
Olympus Fluoview FV3000 and processed with the MetaMorph soft- 
ware (Molecular device). Cell death was quantified by calculating the 
number of PlI-labelled cells as a percentage of the total number of 
cells observed by DIC. Early apoptotic activity was assessed using the 
cleaved-caspase-3 (Asp175) antibody purchased from Cell Signaling 
Technology (9661). The experiment was repeated three times. 


Statistical analysis 

Statistical analysis was performed using GraphPad Prism 7. All details 
on statistics have been indicated in figure legends and Source Data. 
Sample distribution was assessed with the D’Agostino-Pearson normal- 
ity test or the Kolmogorov-Smirnov test depending on sample size. 
Unpaired two-tailed Student’s t-test was used to determine statistical 
significance when comparing two independent groups with normal 
distribution and no significant difference in their s.d. When comparing 
two independent groups in which at least one did not fit the normal- 
ity criteria, unpaired two-tailed Mann-Whitney U-test was used. For 
analysis of statistical significance in comparisons involving more than 
two groups with normal distribution, ordinary one-way ANOVA with 
Dunnett’s (when comparing the mean of each group with the mean 
of acontrol group) multiple comparisons test was used. For multiple 
comparisons in which at least one group did not comply with normal 
distributed data, Kruskal-Wallis and Dunn’s post hoc tests were used. 
Inall cases, statistical significance was assessed with a 95% confidence 
interval; therefore, P< 0.05 was considered significant. 


Purification of ubiquitylated proteins and sample preparation 
for mass-spectrometry analysis 

HCT116 cells (1x 10’) were treated with 0.2 M sucrose for the indicated 
periods of time (0, 15, 30 min) before collection. The cells were lysedina 
urea-containing lysis buffer (SO mM Tris-HCl, pH 7.5, 100 mM NaCl, 10% 
glycerol and 6M Urea) supplemented with 10 iM bortezomib (LC Labo- 
ratories), 10 mM iodoacetamide (GE Healthcare) and 1x Complete pro- 
tease inhibitor cocktail (Roche, EDTA free), and sonicated ona Handy 
Sonic (Tomy Seiko). The lysate was then clarified by centrifugation 
(20,000g for 20 min), and the protein concentration was determined 
using the BCA protein assay kit (Thermo Fisher Scientific). To capture 
ubiquitylated proteins, we used a biotinylated trypsin-resistant tandem 
ubiquitin-binding entity ('°TR-TUBE)*.. ®'°TR-TUBE (10 pg) and 5 mg of 
celllysate were incubated for 1h at 4 °C and incubated for an additional 
for 1h at 4 °C with 50 pl of Dynabeads MyOne streptavidin C1 (Thermo 
Fisher Scientific). After two washes with lysis buffer containing 6 M urea 
and two washes with urea-free lysis buffer, the beads were suspended 
in 50 pl of 50 mM ammonium bicarbonate (AMBC) and 0.1% RapiGest 
SF (Waters). Proteins on beads were reduced in5 mM TCEP for 30 min 
at 50 °C and then alkylated with 10 mM methylmethanethiosulfonate 
for 10 min at room temperature. The alkylated proteins were digested 
overnight at 37 °C with 1 pg of trypsin Gold (Promega). After tryptic 
digestion, samples were acidified to ~pH 2 with trifluoroacetic acid 
(TFA) and desalted by solid-phase extraction using GL-Tip SDB and 
GL-Tip GC (GL Sciences). Ubiquitinated peptides were enriched using 
the PTMScan ubiquitin Remnant Motif (K-e-GlyGly) Kit (Cell Signal- 
ing Technology)*’. After three washes with 200 mM triethanolamine 
(pH 8.2), antibody-coupled beads were cross-linked with 20 mM dime- 
thyl pimelimidate in200 mM triethanolamine (pH 8.2) for lhat room 
temperature. The beads were then washed three times with 50 mM 
Tris-HCI (pH 7.5),100 mM NaCl, and stored at 4 °C until use. Trypsinized 


peptides prepared as described above were suspended in 0.2 ml IAP 
buffer (SO mM MOPS, pH 7.2, 10 mM sodium phosphate and 50 mM 
NaCl) and incubated with 10 pl of antibody-coupled beads for 2h at 
4 °C. After two washes with 250 pl of IAP buffer and three washes with 
250 ul of Milli-Q water (Millipore), di-Gly peptides were eluted with 
3 x 20 pl of 0.15% TFA. The eluted peptides were desalted to 0.1% TFA 
using GL-Tip SDB and GL-Tip GC (GL Sciences), and analysed by mass 
spectrometry. 


Purification of proteasome-interacting proteins for mass- 
spectrometry analysis 

After addition of 0.2 M sucrose and incubation for 30 min, HCT116 
PSMD6-eGFP-3xFlag*"“' cells were cross-linked with 0.2% formaldehyde 
(Thermo Scientific) for 10 min, after which the reaction was quenched 
with 0.25 M glycine. After two washes with PBS, the cells were lysed with 
lysis buffer (SO mM Tris-HCl, pH 7.5, 100 mM NaCl, 10% glycerol, 4 mM 
ATP, 10 mM MgCl, 0.2% NP-40, 1 x Protein Complete Inhibitor-EDTA, 
1mM bortezomib, 1 mM iodoacetamide) with sonication. Cleared 
lysates (2 mg) were incubated with 10 pl of anti-DDDDK antibody-cou- 
pled agarose (MBL) for lhat 4 °C. After the beads were washed five times 
with lysis buffer, proteins were eluted with 3xFlag peptide (Sigma). The 
eluted proteins were mixed with 3x LDS NuPAGE sample buffer and 
incubated for 30 min at 95 °C for uncrosslinking. Proteins were then 
electrophoresed 1 cm by 4-12% NuPAGE. The gels were stained with Bio- 
Safe Coomassie Stain (Bio-Rad). After washing extensively with Milli-Q 
water, the gels were excized, cut into 1-mm? pieces, destained twice 
for Lheach with 1 ml 50 mM AMBC/50% acetonitrile (ACN), with agita- 
tion, and then dehydrated with 100% ACN. Trypsin solution (Promega, 
20 ng lin 50 mM AMBC/5% ACN) was added to the gel pieces, and the 
samples were incubated at 37 °C overnight. Digests were extracted with 
100 pl of 50% ACN/0.1% TFA. The extracted peptides were concentrated 
using a speed-vac and resuspended in 0.1% TFA. 


Mass-spectrometry analysis 

Shotgun mass-spectrometry analysis was performed essentially as 
described*’**, An Easy nLC 1000 system (Thermo Fisher Scientific) was 
connected online to a Q Exactive mass spectrometer (Thermo Fisher 
Scientific) with a nanoelectrospray ion source. The mobile phases 
were 0.1% formic acid (FA) in water (solvent A) and 0.1% FA in 100% ACN 
(solvent B). Peptides were directly loaded onto a C18 analytical column 
(ReproSil-Pur 3 pm, 75-"m inner diameter and 12-cm length, Nikkyo 
Technos) and separated using a 150-min three-step gradient (0-40% 
solvent B for 120 min, 40-100% for 20 min and 100% for 10 min) ata 
constant flow rate of 300 nl min”. For ionization, liquid junction volt- 
age was 1.8 kV and capillary temperature was 250 °C. The Q Exactive 
was operated by the Xcalibur software 2.2 (Thermo Fisher Scientific) 
in data-dependent MS/MS mode, and the top 10 most intense ions 
with charge state +2 to +5 were selected with an isolation window of 
m/z=2.0 and fragmented by higher-energy collisional dissociation 
with a normalized collision energy of 25. Resolution and automatic 
gain control target for the survey scans were set to 70,000 and 3E6, 
respectively. lons selected for MS/MS were dynamically excluded for 
5s (for diGly peptide identification) or 30 s (for binding protein iden- 
tification). The data were analysed using the Mascot search program 
(Matrix Science) in Proteome Discoverer 1.3 (Thermo Fisher Scientific). 
Maximum missed cleavage site of trypsin was set to two. Oxidation 
(Met), GlyGly (Lys), phosphorylation (Ser, Thr, Tyr) and pyroglutamate 
conversion (Gln) were selected as variable modifications. For the diGly 
proteome, Cys-methylthio modification was set as a static modifica- 
tion for database searches. Peptide identification was filtered at a false 
discovery rate <0.01. 


Cryo-EM 
For sample preparation, 25,000 cells were seeded on EM grids (R2/1, Au 
200 mesh grid, Quantifoil Micro Tools) ina35-mm dish and cultured in 


DMEM supplemented with penicillin-streptomycin, 1% non-essential 
amino acids and 10% fetal bovine serum overnight. Thirty minutes after 
sucrose stimulation by replacing the medium with 0.2 M sucrose, the 
grids were blotted for 10 s using filter paper and vitrified by plunge- 
freezing into a liquid ethane/propane mixture using a Vitrobot Mark 
IV (FEI). Cryo-focused ion beam (FIB) microscopy cryo-electrontomo- 
graphic data collection was performed as described in detail before*. 
In brief, electron microscopy grids were mounted onto modified 
Autogrids sample carriers“, and then transferred into a dual-beam 
(FIB/SEM) microscope (Quanta 3D FEG, FEI) using a cryo-transfer sys- 
tem (PP3000T, Quorum). For the whole procedure, samples were kept 
at aconstant temperature of -180 °C. Thin lamellae with a thickness of 
about 200 nm were prepared in the nuclear region of the cells using a 
Ga** ion beam at 30kV under a 20° stage tilt angle. The milling progress 
was monitored by SEM imaging at 5 kV. The lamellas were transferred 
to an FEI Titan Krios transmission electron microscope equipped with 
aGatan post-column energy filter and Gatan K2 Summit direct detector 
for tomographic data collection. Tilt series were recorded from —50° 
to+70° with anincrement of 2° using SerialEM software” at anominal 
magnification of 42,000x, resulting ina pixel size of 3.42 Aat the speci- 
men level. On average, six frames were collected for each image under 
counting mode, resulting ina total dose of 110 e A” per tilt series. For 
image processing, the MATLAB (Mathworks) TOM toolbox‘ was used 
as a general platform for image processing. K2 frames were aligned 
using in-house software (K2Align) based on previous work’. K2Align 
code is available at GitHub (https://github.com/dtegunov/k2align). 
Using the IMOD software package, tilt series were first aligned using 
fiducial-less tracking, and then tomograms were reconstructed by 
weighted back-projection of the resulting aligned tilt series*°. To iden- 
tify proteasomes in the tomograms, a mirrored single-capped protea- 
some structure (EMD-3916) was filtered to 60 Aas an initial template 
for template matching in the twice-binned tomograms (13.68 A? per 
voxel) using PyTom*. The resulting subtomograms were cropped out, 
CTF-corrected and classified using RELION™. The resulting structure 
clearly showed a 26S proteasome complex with correct handedness. 
However, owing to limitations on the number of subtomograms, we 
could not separate more conformational information. 


Reporting summary 
Further information on research design is available in the Nature 
Research Reporting Summary linked to this paper. 


Data availability 

Raw cryo-ET data have been deposited to the Electron Microscopy 
Data Bank under accession code EMD-10494. The mass spectrometry 
proteomics data have been deposited to the ProteomeXchange Con- 
sortium via the PRIDE partner repository with the data set identifier 
PXD01637 and PXDO16369. The uncropped blots and gels are provided 
in Supplementary Fig. 1. Source Data for Figs. 1-4 and Extended Data 
Figs. 1, 3-9 are provided with the paper. 


Code availability 


K2Align code is available at GitHub (https://github.com/dtegunov/ 
k2align). 
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Scale bars, 0.5 pm. Representative images from two independent experiments. 
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prior), and stimulated with 0.2 M sucrose. Scale bars, 5 um. Right graphs 
indicate the focinumbers per cell and their diameters. RPL29 foci number are 
mean +s.d. fromn=82cells (control) andn=68 cells (CX-5461); ****P< 0.0001 
by two-tailed unpaired t-test. RPL29 foci diameter are presented as mean+s.d. 
from n=542 foci (control) and n= 690 foci (CX-5461); ****P< 0.0001 by two- 
tailed Mann-Whitney U-test. PSMB2 foci number are mean + s.d. fromn=216 
cells (control) and n=228 cells (CX-5461); **P=0.0092 by two-tailed Mann- 
Whitney U-test. PSMB2 foci diameter are mean +s.d. fromn= 815 foci (control) 
and n=1,149 foci (CX-5461); ****P< 0.0001 by two-tailed Mann-Whitney 

U-test. d, PSMB2-FusionRed*"' cells transiently overexpressing (white arrows) 
or not overexpressing RPL7A-eGFP or RPS2-eGFP were stimulated with 0.2M 
sucrose for 30 min. Scale bars, 5 um. Similar results were obtained from three 
independent experiments. 
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Extended Data Fig. 6 | Microscopy analysis of proteasome-interacting 
proteins. a, PSMB2-FusionRed*' cells were stimulated with 0.2 M sucrose for 
30 min. Endogenous p97 and ubiquitin were detected with anti-p97 and anti- 
ubiquitin antibodies, respectively. Representative xy, xz and yz images ofa 
single cell from two independent experiments. Scale bars, 5 um. b, PSMB2- 
FusionRed*"*' cells were stimulated with 0.2 M sucrose for 30 min. Endogenous 
p97 and RAD23B were detected with anti-p97 and anti-RAD23B antibodies, 
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independent experiments. Scale bars, 5 um.c, PSMB2-eGFP*"*' cells were 
transiently transfected for 48 h with siRNA targeting RAD23A, RAD23B, 
UBQLN1/2/4 (mixture) or XPC, or acontrol siRNA, and then stimulated with 
0.2 Msucrose for 30 min. Right graph indicates the number of foci per cell 
(siControl, n=323 cells; siRAD23A, n=129 cells; siRAD23B, n=178 cells; 
siUBQLN1/2/4, n=349 cells; sixPC, n=339 cells). Dataare mean+s.d., 
****P < 0.0001 by Kruskal-Wallis with Dunn’s multiple comparisons test. 
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numbers per cell and their diameters. The foci number per cell are presented as 


mean +s.d. fromn=35 cells (control), n=44 cells (b-AP15),n =37 cells 
(MLN-7243), n=50 cells (RAD23B KO) and n=45 cells (NMS-873). P=0.2250 


(b-AP15), **P= 0.0085 (MLN-7243), **P= 0.0017 (RAD23B KO) and P=0.9744 


(NMS-873) by one-way ANOVA with Dunnett’s multiple comparisons test. The 
foci diameters are presented as mean +s.d. fromn=220 foci (control), n=210 
foci (b-AP15), n=132 foci (MLN-7243), n=170 foci (RAD23B KO) and n= 271 foci 
(NMS-873). ****P< 0.0001 (b-AP15), ****P< 0.0001 (MLN-7243), ****P< 0.0001 
(RAD23B KO) and ****P< 0.0001 (NMS-873) by Kruskal-Wallis with Dunn’s 
multiple comparisons test. Scale bars, 10 pm. b, Time-lapse images of single 
fociin the b-AP15 or NMS-873 treated PSMB2-FusionRed*"“'RPL29 
-eGFPAAYSYMVS! Cells under the same conditions described ina. Scale bars, 

0.5 um. Representative images from four (control) or two (b-AP15 and 
NMS-873) independent experiments. 
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Extended Data Fig. 8 | Characterization of RAD23B mutants in foci 
formation. a, Wild-type, RAD23B-KO (RAD23B**°) or UBE3A-KO (UBE3A*™*°) 
HCT116 cells were stimulated with 0.2 M sucrose for 30 min and 
immunoblotted with the indicated antibodies. Similar results were obtained 
from two independent experiments. b, PSMB2-eGFP*"“'RAD23B*" cells 
stably expressing FusionRed-fused RAD23B(WT), UBL (L8A) or UBA (L225A- 
L401A) were treated with 0.2 M sucrose for 30 min. Endogenous K48-linked 
ubiquitin chains were detected with K48-ubiquitin antibody and Alexa Fluor 
647-labelled secondary antibody. Merged images represent K48-ubiquitin 
antibody (red), PSMB2-eGFP (green) and DAPI (blue). Scale bars, 10 pm. 
Representative images from two independent experiments. c, RAD23A- 
FusionRed-overexpressing PSMB2-eGFP*"*"'RAD23B*"*° cells were treated 
with 0.2 M sucrose for 30 min. Scale bars, 10 pm. Similar results were obtained 
from two independent experiments. d, Cellular abundances of RAD23A and 
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Extended Data Fig. 9 | Liquid-liquid phase separation of ubiquitin chains 
and RAD23B in vitro. a, SDS-PAGE and fluorescent images of CyS-labelled 
proteins used in the in vitro liquid-liquid phase separation assay. Cy5-labelled 
K48-linked ubiquitin chains (CyS-K48Ub™*) generated by enzymatic reactions 
using E2-25K (left), Cy5-labelled K48-linked ubiquitin chains size-fractionated 
by gel filtration (middle) and CyS-labelled K63-linked ubiquitin chains size- 
fractionated by gel filtration (right). Note that K63-Ub2 was omitted owing to 
difficulty of separation. b, SDS-PAGE and fluorescent images of Cy3-labelled 
RAD23 proteins. c, Effects of concentration of molecular crowding agents 

and NaCl. Images of the solution were obtained 90 min after mixing 
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Antibodies 


Antibodies used Antibody (clone), dilution, Supplier, Cat. number, Lot number: 
Ubiquitin, 1:500 (for IF) or 1:1000 (for WB), Dako, Z0458, 00082271 
Ubiquitin (FK2), 1: 500, Nippon Bio-Test Laboratories, NBT-MFKO03, C7070 
Ubiquitin Lys48-Specific (Apu2), 1:500 (for IF) or 1:1000 (for WB), Millipore, 05-1307, 2918991 
Ubiquitin Lys63-Specific (Apu3), 1:500, Millipore, 05-1308, 3137755 
PSMB2, 1:1000, Enzo, BML-PW8890, Z03617b 


= 
je’) 
a 
iS 
= 
a) 
= 
a) 
Za) 
a) 
je) 
= 
a) 
=P 
= 
io) 
1) 
2) 
= 
=} 
© 
Wn 
S 
3 
fev) 
= 
<= 


Validation 


PSMD6, 1:1000, Enzo, BML-PW8225-0100, 01041704 

PSMA1, 1:500, Kumatori A, Tanaka K et al. PNAS, 1990 

VCP (5), 1:500, Abcam, ab11433, GR3180147-3 

RAD23B (H-120), 1:500, Santa Cruz, sc-67225, AO809 

RAD23B (D4W7F), 1:1000, Cell Signaling Technology, 13525, 1 
RAD23A (D7U7Z), 1:1000, Cell Signaling Technology, 24555, 1 
UBE3A, 1:100 (for IF) or 1:1000 (for WB), Protein Tech, 10344-1-AP, 00010503 
RPL15, 1:500, Protein Tech, 16740-1-AP, 00008177 

PL7A (E109), 1:100, Cell Signaling Technology, 2415,2 

PL29 (BO1P), 1:50, Abnova, HO0006159-B01P), GB251 

PL35, 1:50, Sigma, SAB4500233, 3114173 

PL4 (RQ-7), 1:50, Santa Cruz, sc-100838, A3017 

PS2 (EPR10834(B)), 1:100, Abcam, ab155961, GR117602-1 
PS9, 1:100, Proteintech, 18215-1-AP, 00009939 

PS6 (5G10), 1:200, Cell Signaling Technology, 2217, 7 

rRNA (Y10b), 1:100, Santa Cruz, sc-33678, A0510 

PML (PG-M3), 1:100, Santa Cruz, sc-966, A313 

Coilin, 1:1000, Abcam, ab11822, GR127265-2 

BMI1 (D20B7), 1:600, Cell Signaling Technology, 6964, 1 
SC35, 1:100, Abcam, ab11826, GR144730-2 


DDWDWWAWAAD 


CENPC, 1:1000, MBL, PD030, 004 
phospho-Histone H2A.X (Ser139) (JBW301), 1:100, Millipore, 05-636, 2310335 
Cleaved Caspase-3 (Asp175) (SA1E), 1: 1000, Cell Signaling Technology, 9661, 42 
FLAG (M2), 1:2000, Sigma, A8592, SLBH1183V 

Actin, 1:4000, MBL, PM053-7, 007 
Alexa Fluor 488—conjugated anti-mouse, 1:1000, Invitrogen, A-11029, 1252783 
Alexa Fluor 488—-conjugated anti-rabbit, 1:1000, Invitrogen, A-11034, 1073084 
Alexa Fluor 568—conjugated anti-mouse, 1:1000, Invitrogen, A-11031, 1736975 
Alexa Fluor 568—conjugated anti-rabbit, 1:1000, Invitrogen, A-11036, 1832035 
A 
A 
A 
H 
H 


exa Fluor 568—conjugated anti-guinea pig, 1:1000, Invitrogen, A-11075, 1170591 

exa Fluor 647—conjugated anti-mouse, 1:1000, Invitrogen, A-21236, 1511347 

exa Fluor 647—conjugated anti-rabbit, 1:1000, Invitrogen, A-21245, 1892148A 

RP-conjugated goat anti-mouse Ig, 1:10000, Jackson ImmunoResearch Laboratories, 115-035-003 
RP-conjugated goat anti-rabbit Ig, 1:10000, Jackson ImmunoResearch Laboratories, 111-035-144, 55285 


Ubiquitin (Dako, Z0458): Previously used for publication (Tsuchiya H., Mol Cell. 2017 May 18;66(4):488-502.e7.), and (Kageyama 
., PLoS One. 2019 May 31;14(5):e0217945.). 
Ubiquitin (FK2, Nippon Bio-Test Laboratories, 0918-2): Previously used for publication (Nozawa T et al., Autophagy. 
2017;13(11):1841-1854.). 
Ubiquitin, Lys48-Specific (Millipore, 05-1307): Supplier has been validated in Flow cytometry analysis of Jurkat cells, 
mmunocytochemistry of NIH/3T3, HeLa, and A431 cells, Immunoprecipitation via precipitated ubiquitin-containing proteins 
from HeLa cell lysates, Immunohistochemistry of rat brain sections and human breast carcinoma, and Western blotting of the 
inhibition assay using HeLa cell lysate. 
Ubiquitin, Lys63-Specific (Millipore, 05-1308): Supplier has been validated in Flow cytometry analysis of Jurkat cells, 
mmunocytochemistry of staining of colorectal cancer tissue sections limited to tumor cells, Immunoprecipitation via 
precipitated ubiquitin-containing proteins from HeLa cell lysates, Immunohistochemistry of staining of colorectal cancer tissue 
sections limited to tumor cells, and Western blotting of the inhibition assay using HeLa cell lysate. 
PSMB2 (Enzo, BML-PW8890-0100): Previously used for publication (Zana Lukic et al., Retrovirology. 2011; 8: 93.) 

PSMD6 (Enzo, BML-PW8225-0100): Previously used for publication (Mata-Cantero L et al., J Proteomics. 2016 Apr 29;139:45-59.) 
PSMA1: Previously used for publication (Kumatori A, Tanaka K et al., Proc Natl Acad Sci U S A. 1990 Sep;87(18):7071-5.) 

VCP (Abcam, ab11433): Supplier has been validated in Western blotting of cell lysate from mouse embryonic fibroblasts and 
CA46 cells, Immunohistochemistry of Human colon carcinoma tissues limited to tumor cells, Flow cytometry analysis of Platelets, 
and Immunocytochemistry of WiDr colon carcinoma cells. 

RAD23B (Santa Cruz, sc-67225): Supplier has been validated in Immunohistochemistry of human cerebellum, and 
mmunocytochemistry of human cell line U-2 OS cells. 

RAD23B (Cell Signaling Technology, 13525): Supplier has been validated in Western blotting of extracts from 293T cells. 

RAD23A (Cell Signaling Technology, 24555): Supplier has been validated in Western blotting of extracts from SK-OV-3, OVCAR3, 
CCRF-CEM, RPMI 8226, 786-O, A498, Hepa 1-6, M-1, C6, A-10, COS-7 and 293T cells. 

UBE3A (Protein Tech, 10344-1-AP): Supplier has been validated in Western blotting of mouse brain tissue or extracts from 
HEK293, HeLa, Jurkat and K562 cells, Immunohistochemistry of paraffin-embedded human lung cancer tissue, and 
mmunocytochemistry of HeLa cells. 

RPL15 (Protein Tech, 16740-1-AP): Supplier has been validated in Western blotting of COLO 320 cells, and Immunocytochemistry 
of HeLa cells. 

RPL7a (Cell Signaling Technology, 2415): Supplier has been validated in Western blotting of extracts from HEK293, NIH/3T3, PC12 
a 

R 


nd COS cells, and Immnocytochemistry of COS cells. 

PL29 (Abnova, HOO006159-B01P): Supplier has been validated in Western blotting of extracts from human pancreas, 
mmunocytochemistry of HeLa cells. 

RPL35 (Sigma, SAB4500233): Supplier has been validated in Immunoblotting analysis of extracts from 293, HeLa, COLO and 
urkat cells. 
RPL4 (Santa Cruz, sc-100838): Supplier has been validated in Immunoblotting analysis of extracts from HeLa, Jurkat and K-562 
cells, and Immunocytochemistry of HeLa cells. 

RPS2 (Abcam, ab155961): Supplier has been validated in Immunohistochemistry of Human kidney and pancreas tissue, Western 
blotting of the inhibition assay using HeLa, A549 and 293T cell lysate, and Immunocytochemistry of HeLa cells. 

RPS9 (Proteintech, 18215-1-AP): Supplier has been validated in Western blotting of extracts from mouse uterus rissue, 
mmunocytochemistry of Human kidney, and Immunocytochemistry of HeLa cells. 


=) 
jad) 
a 
‘= 
= 
o 
= 
o 
Za) 
o) 
fev) 
= 
(a 
= 
= 
o 
o 
e) 
oa 
5) 
a 
Za) 
S 
3 
5 
fev) 
5 
S 


Eukaryotic cell lines 


RPS6 (Cell Signaling Technology, 2217): Supplier has been validated in Western blotting of extracts from HeLa, NIH/3T3, PC12 
and COS cells, Immunocytochemistry of human carcinoma, and Immunocytochemistry of mouse bain. 

rRNA (Santa Cruz, sc-33678): Supplier has been validated in Immunocytochemistry of HeLa cells. 

PML (Santa Cruz, sc-966): Supplier has been validated in Western blotting of extracts from K-562 and COLO cells, 
Immunohistochemistry of human urinary bladder tissue, and Immunocytochemistry of A-431 cells. 

Coilin (Abcam, ab11822): Supplier has been validated in Immunocytochemistry of MCF-7 cells. 

BMI1 (Cell Signaling Technology, 6964): Supplier has been validated in Western blotting of extracts from HeLa, A-549, COS-7 and 
Vero cells, Immunocytochemistry of human lung carcinoma, and Immunocytochemistry of COS-7 cells. 

SC35 (Abcam, ab11826): Supplier has been validated in Immunocytochemistry of HEK-293 human kidney cells, and 
mmunocytochemistry of human hippocampus tissue. 

CENPC (MBL, PDO30): Supplier has been validated in Western blotting of extracts from HeLa cells, and Immunocytochemistry of 
HeLa cells. 
Phospho-Histone H2A.X (Ser139) (Millipore, 05-636): Supplier has been validated in Western blotting of extracts from Jurkat 
ells, Immunocytochemistry of Jurkat cells, and Immunohistochemistry of RNF168-WT and RNF 168-SA/SEKI mice lung tissue. 
reviously used for publication (Xe, X., et al. (2015) Nat. Cell Biol. 20 (3); 320-331.), (Meier, Andreas, et al. EMBO J., 26: 2707-18. 
2007)) 
eaved Caspase-3 (Asp175) (Cell Signaling Technology, 9661): Supplier has been validated in Western blotting of extracts from 
eLa, NIH/3T3 and C6 cells, Immunohistochemistry of human tonsil and mouse embryo, Immunocytochemistry of HT-29 cells, 
nd Flow cytometry analysis of Jurkat cells. 

LAG (Sigma, A8592): Previously used for publication (Tsuchiya H et al., Nat Commun. 2018 Feb 6;9(1):524.), and (Chaumet A et 
., Nat Commun. 2015 Sep 10;6:8218.). 

Actin (MBL, PM053-7): Supplier has been validated in Western blotting of extracts from PC12 and HEK293 cells. 
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Policy information about cell lines 


Cell line source(s) 
Authentication 


Mycoplasma contamination 


Commonly misidentified lines 
(See ICLAC register) 


HCT116, hTERT RPE-1, HEK293T, and E147G2a cells were purchased from American Type Culture Collection (ATCC). 
Cell line authentication was not performed. 


All cell lines used were periodically checked by fluorescent microscopy. Cells were stained with DAPI and examined under X60 
or X100 objective lens and no contamination was found. 


No commonly misidentified cell lines were used in this study. 
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Many biomolecules undergo liquid-liquid phase separation to form liquid-like 


condensates that mediate diverse cellular functions’”. Autophagy is able to degrade 
such condensates using autophagosomes—double-membrane structures that are 
synthesized de novo at the pre-autophagosomal structure (PAS) in yeast* >. Whereas 
Atg proteins that associate with the PAS have been characterized, the 
physicochemical and functional properties of the PAS remain unclear owing to its 
small size and fragility. Here we show that the PAS is in fact a liquid-like condensate of 
Atg proteins. The autophagy-initiating Atgl1 complex undergoes phase separation to 
form liquid droplets in vitro, and point mutations or phosphorylation that 

inhibit phase separation impair PAS formation in vivo. In vitro experiments show that 
Atgl-complex droplets can be tethered to membranes via specific protein-protein 
interactions, explaining the vacuolar membrane localization of the PAS in vivo. We 
propose that phase separation has a critical, active role in autophagy, whereby it 
organizes the autophagy machinery at the PAS. 


The PAS is a transient structure that is regulated by nutrient conditions 
and invariably forms on the vacuole in yeast on starvation®. The PAS ini- 
tially comprises Atg1 complexes consisting of Atg1, Atg13, Atg17, Atg29 
and Atg31, which are abundant with intrinsically disordered regions 
(IDRs)®. This ‘early PAS’ then matures by recruiting downstream Atg pro- 
teins and vesicles, subsequently serving as the site of autophagosome 
formation’®. These features are consistent with biomolecular conden- 
sates (also knownas membraneless organelles) that are formed through 
liquid-liquid phase separation’. We therefore set out to determine 
whether the PAS is in fact a biomolecular condensate, and examined 
whether condensate formation is able to explain its spatiotemporal 
behaviour in the cell. 


The PAS is a liquid-like condensate 


First, we studied the dynamics of the PAS using fluorescence micros- 
copy. We used yeast cells overexpressing GFP-Atg13 in an atg11A 
background to ensure that fluorescence intensity is sufficient for 
quantitative analysis and that the PAS is formed in response to starva- 
tion rather than the Atg11-dependent pathway, which is constitutive 
and responsible for cytoplasm-to-vacuole targeting”"”. We confirmed 
that overexpression of GFP-Atg13 did not impair autophagy activity 
(Extended Data Fig. 1a). Upon nitrogen starvation, GFP-Atg13 formed 
puncta that dissolved within 8 min of addition of a nitrogen source 
(Extended Data Fig. 1b), consistent with a previous study’, suggesting 
that the PAS is a dynamic and transient entity. Fluorescence recovery 


after photobleaching (FRAP) experiments showed a quick recovery of 
fluorescence in puncta (corresponding to the PAS) containing GFP- 
Atgl13, with a recovery half-time of 1.3 s after photobleaching (Fig. 1a, 
Extended Data Fig. 1c, Supplementary Video 1). This exchange rate 
is comparable with or even faster than that of molecules in nuclear 
biomolecular condensates”. We also observed rapid fluorescence 
recovery of Atg1-GFP, Atg13—-GFP and Atg17-GFP proteins expressed 
at endogenous levels (Extended Data Fig. 1d). We next performed flu- 
orescence-correlation microscopy (FCS) experiments on GFP-Atg13. 
The diffusion coefficient of GFP-Atg13 in the PAS was about half that 
in the cytosol (Fig. 1b, c, Extended Data Fig. le). These results provide 
evidence supporting a dynamic liquid-like structure of the PAS in which 
GFP-Atg13 can move diffusively, although the crowded environment of 
the PAS would have slowed down the diffusion, as reported with other 
liquid-droplet structures”. Owing to the small size of the PAS, the rate 
determined by FCS could potentially include both entry and exit of GFP- 
Atg13 molecules to the PAS. We next generated and performed FRAP 
analyses ona giant PAS (diameter >1 pm) by overexpressing GFP-Atg13 
froma multicopy plasmid; Atg13—GFP fluorescence rapidly recovered 
(less than 1s) in a partially quenched region (Fig. 1d, Extended Data 
Fig. 1f, Supplementary Video 2). These data show that Atg13 is not only 
able to enter and exit the PAS, but also moves freely within the PAS. 
Previous studies have established that 1,6-hexanediol is able to inhibit 
liquid-liquid phase separation of biomolecules". When yeast cells were 
treated with1,6-hexanediol, Atg13 puncta rapidly dissolved; they reap- 
peared on the vacuolar membrane after removal of 1,6-hexanediol 
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Fig. 1| The PAS behaves as aliquid droplet in vivo. a, Left, rapid recovery of 
fluorescence of GFP-Atg13 puncta after photobleaching. Right, fluorescence 
recovery fitted to acurve; dissociation rate constant (K,;) and recovery 
half-time (¢,,.) values (mean +s.d.) were calculated from n =3 independent 
experiments. DIC, differential interference contrast microscopy. b, FCS 
measurements of GFP-Atg13 in the PAS and inthe cytoplasm. The 
autocorrelation function of the fluorescence signal is calculated for asingle 
pixel near the centre of the PAS, or for the five pixels in the cytoplasm (insets). 
The autocorrelogram was fitted to a three-dimensional one-component 
diffusion model to estimate diffusion coefficients. c, Tukey-style box-and- 
whisker plot”s of the diffusion constants of GFP-Atg13 in the PAS (n=35 cells) 
andin the cytoplasm (n= 32 cells) measured by scanning FCS (see Methods for 
details). The 95% confidence interval of the medians are shown with notches, 
and are estimated to be 0.79 + 0.23 pm?s7 (PAS) and 0.93 + 0.27 pm?s7 
(cytoplasm). *P= 0.031, two-sided Wilcoxon Mann-Whitney U-test. d, Partial 
FRAP experiment witha giant PAS. Graphs indicate line profiles of fluorescence 
intensity (FI) inthe above images. e, Reversible effect of 1,6-hexanediol onthe 
formation of Atg13-GFP puncta. Rapa, rapamycin treatment. f, Appearance of 
multiple Atg13-GFP puncta and their coalescence to form one large punctum. 
g, Left, coalescence of two PAS precursors. Right, graph shows the changein 
aspect ratio during coalescence. h, Ostwald ripening of PAS precursors 
observed in vivo. i, Line profile of fluorescence intensity inh. All scale bars are 
2 uum. Experiments were repeated independently twice (e, f) or three times 

(a, d, g, h) with similar results. 


(Fig. 1e) and coalesced to forma larger punctum (Extended Data Fig. 1g, 
Supplementary Video 3). To observe the process of PAS formation in 
detail, we optimized the expression level of GFP-Atg13 using the induc- 
ible GALI promoter. When yeast cells accumulating GFP-Atg13 during 
a7-hinduction were treated with rapamycin for 10 min, multiple small 
GFP-Atg13 puncta appeared and coalesced to forma large punctum 
(Fig. 1f, Supplementary Video 4). Upon coalescence of two puncta, the 
aspect ratio changed from approximately 2.0 to 1.0 within afew seconds, 
reflecting the liquid-like nature of puncta (Fig. 1g, Extended Data Fig. 1h, 
Supplementary Video 5). We occasionally observed the enlargement 
of one PAS punctum coinciding with the reduction of another (Fig. 1h, 
i, Extended Data Fig. 1i, Supplementary Video 6). This phenomenon 
is consistent with Ostwald ripening, although local phosphorylation 
events might dissolve the shrinking PAS punctum. Collectively, these 
data suggest that the starvation-induced PAS is a liquid-like biomo- 
lecular condensate that is formed by liquid-liquid phase separation. 


302 | Nature | Vol578 | 13 February 2020 


» 
3 
sv 


Time (min) 
10 


Predicted disordered 
residues (%) 


Atg1 Atg13 Atgi7 Atg29 Atg31 
Time (s) 


8 83 
4 4 
oA a. 


—Number —Average area 


Number of droplets 
Average area (AU) 


0) 5 10 15 20 
Time (min) 


Atgi7 


Aspect ratio 


“0 50 100 150 
Time (s) 


Atgi3176R 
rigs 
S,, 
D247 
F430 
W 
cl Atatzp 


Fig. 2| The Atg1 complex undergoes phase separation in vitro. a, Percentage 
of residues predicted to be disordered by DISOPRED”’. b, Formation of liquid 
droplets of the Atg1 complex on mixing. The bottom graph shows the 

mean +s.d. of number and area of the droplets (n=3 independent 
experiments). Scale bar, 50 pm.c, Coalescence observed between the droplets 
of the Atg1 complex. Scale bar, 5 um. d, Time course of the changes in aspect 
ratio during coalescence inc. e, Co-occurrence of Atg1, Atg13 and Atg17 inthe 
Atg1-complex droplets. Scale bar, 30 pm. f, Interactions of Atg13 17BR (left) 
and 17LR (right) with Atg17 (PDB 5JHF). Residue numbers refer to the 
Saccharomyces cerevisiae Atg17 sequence. g, Formation of scaffold-complex 
liquid droplets and their inhibition by mutations. Scale bar, 50 pm. 
Experiments were repeated independently three times with similar 

results (c,e,g). 


The Atg1 complex forms droplets in vitro 


We previously reported that a higher-order assemblage of the Atg1 
complex organizes the PAS and initiates autophagy’. With the excep- 
tion of Atg17, the components of the Atg1 complex contain many IDRs 
(Fig. 2a, Extended Data Fig. 2a). We speculated that the higher-order 
assemblage of the Atg1 complex induces liquid-liquid phase separa- 
tion, providing a mechanism for formation of a liquid-like PAS. SNAP- 
tagged Atgl, Atg13, and Atg17-Atg29-Atg31 were purified (Extended 
Data Fig. 2b) and labelled with distinct fluorescent dyes. Mixing of 
these proteins resulted in the immediate onset of phase separation, 
indicated by the appearance of multiple spherical droplets, the number 
of which reached a maximum at 15 min before a subsequent decline, 
whereas the total area occupied by the droplets increased continu- 
ously (Fig. 2b). Again, droplets coalesced to form a larger spherical 
droplet, during which the aspect ratio changed from approximately 2.0 
to 1.0, suggesting a liquid-like state (Fig. 2c, d, Extended Data Fig. 2c). 
Droplets emitted three distinct fluorescence signals corresponding to 
Atgl1, Atg13 and Atg17-Atg29-Atg31, indicating that these components 
colocalize within droplets (Fig. 2e). Components of biomolecular con- 
densates can be divided into two qualitative classes: scaffolds, which 
are essential for the formation of condensates, and clients, which are 
dispensable’. As Atg1 was dispensable for droplet formation (Extended 
Data Fig. 2d, buffer), Atg1 can be considered as a client, whereas the 
other components act as scaffolds; we therefore refer to the Atg13- 
Atg17-Atg29-Atg31 droplets as scaffold droplets. These scaffold drop- 
lets were promptly dispersed by 1,6-hexanediol treatment (Extended 
Data Fig. 2d, e), similar to the PAS in vivo (Fig. le). Phase separation was 


most efficient at pH 6.0 and was impaired under higher pH conditions 
(>7.0) (Extended Data Fig. 2f, g), consistent with PAS formation under 
starvation conditions that result in acidification of the cytoplasm of 
budding yeast® to a pH around 6.0. 

Itis known that liquid-liquid phase separation of proteins is induced 
by two distinct mechanisms: the first involves nonspecific weak interac- 
tions between IDRs, whereas the second is brought about by multiple 
specific interactions between proteins that possess numerous binding 
modules”. Atg13 possesses both an Atg17-binding region (17BR) and 
an Atg17-linking region (17LR), which bind to distinct regions in Atg17 
(Fig. 2f). Formation of scaffold droplets was severely impaired by the 
F430A or F375A mutation in Atg13 and the D247A or P393A mutation in 
Atg17 (Fig. 2g, Extended Data Fig. 2h), which attenuate the Atg17-17BR 
and Atg17-17LR interactions that are essential for PAS formation and 
autophagy°”*. Thus, phase separation of the Atg1 complex is facilitated 
by multiple specific interactions between Atg13 and Atg17, bothin vitro 
and in vivo. The 1:1 stoichiometry of Atg13 and Atg17-Atg29-Atg31 
is optimal for phase separation, which is impaired by excess Atg17- 
Atg29-Atg31 (Extended Data Fig. 2i). In line with this, overexpression 
of Atg17—GFP impaired droplet liquidity (Extended Data Fig. 2j). Col- 
lectively, these data suggest that the Atg1 complex undergoes phase 
separation to forma liquid droplet through two-site binding between 
Atg13 and Atg17, an essential mechanism of PAS formation in vivo®. 


Regulation of phase separation 


Under nutrient-rich conditions, Atg13 is highly phosphorylated by 
TORC1, which inhibits the formation of the Atg1 complex and the 
PAS'*’”, Some Atg proteins accumulate en masse at the PAS when the 
kinase activity of Atg1 is inhibited’. These observations suggest that 
phosphorylation events negatively regulate the formation of the PAS. 
When Atg13 was incubated with TORCI purified from yeast (Extended 
Data Fig. 3a) and ATP, Atg13 was hyperphosphorylated, including on 
Ser428 and Ser429, whose phosphorylation impairs the interaction 
of Atg13 with Atg17 as well as PAS formation” (Fig. 3a). Phosphoryl- 
ated Atg13 lost the ability to form droplets with Atg17-Atg29-Atg31 
(Fig. 3b), indicating that TORC1 inhibits phase separation by directly 
phosphorylating Atg13, especially at Ser428 and Ser429. 

The kinase activity of Atg1, which is essential for autophagy pro- 
gression”, is activated upon starvation. Activation of Atg1 requires 
autophosphorylation of the kinase domain at Thr226, which is mark- 
edly enhanced following rapamycin treatment. Clustering of Atg1 by 
aselective autophagy cargo or by its targeting to the vacuole acceler- 
ates the autophosphorylation””°. We monitored phosphorylation of 
Thr226 using a phosphorylation-specific antibody for Thr226 when 
Atglalone, phase-separated Atg1 complex or an Atg1 complex in which 
phase separation was inhibited by an F430A mutation in Atg13 were 
incubated with ATP (Extended Data Fig. 3b). There was no increase in 
autophosphorylated protein for Atg1 alone, but a mild increase was 
observed inthe Atg1(F430A) complex, anda more marked increase was 
observed in the phase-separated Atg1 complex (Fig. 3c, Extended Data 
Fig. 3c). These data suggest that phase separation of the Atg1 complex 
facilitates activation of Atg1 kinase, possibly by increasing collisions 
between individual Atg1 molecules. 

Next, we studied the effect of Atg1 kinase activity on droplet for- 
mation. Addition of ATP induced phosphorylation of Atg1, Atg13 and 
Atg29 within 10 min in the Atg1 complex, but not in the Atg1(D211A) 
kinase-dead complex, indicating that these three proteins were phos- 
phorylated by Atg1 (Extended Data Fig. 3d, e). Incubation with ATP dis- 
solved the droplets of the wild-type Atg1 complex, but not those of the 
Atg1(D211A) complex, in a similar time frame (Extended Data Fig. 3f), 
confirming that Atg1-mediated phosphorylation, but not the activity of 
ATPasahydrotrope”, inhibited phase separation of the Atg1 complex. 

In contrast to the in vitro observation that Atg1-complex droplets 
dissolve within minutes in the presence of Atg1 kinase activity, the 
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Fig. 3 | Phase separation of the Atg1 complex is dynamically controlled by 
phosphorylation-dependent regulation of Atg13. a, Phosphorylation of 
Atg13 (P-Atg13) by TORC1. CBB, Coomassie brilliant blue (CBB) staining. b, Left, 
impairment of phosphorylated Atg13 phase separation on mixing with Atg17— 
Atg29-Atg31. Scale bar, 30 um. Right, graph shows time-course analysis of 
droplet area. Data are mean +s.d. (n=3 independent experiments) (b, d,e).c, 
Enhancement of Atg1 Thr226 phosphorylation on phase separation in vitro. d, 
Left, effect of ATP and Ptc2 on phase separation of the Atg1 complex. Right, 
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bar, 10 pm. e, Left, phosphorylation of Atg1 at Thr226 and Atg13 at Ser428/429 
as assessed by western blot analyses. Right, relative band intensity quantified 
from blots. Experiments were repeated independently three times with similar 
results (a,c). For gel source data, see Supplementary Fig. 1(c,e). 


PAS continues to exist for several hours despite its activation of Atg1 
kinase. The PP2C phosphatases Ptc2 and Ptc3 have been reported to 
promote PAS formation and autophagy by dephosphorylating Atg1 
and Atg13”. This suggests that the balance of phosphorylation and 
dephosphorylation of Atg13 may be important for the maintenance 
of the PAS. We studied the effect of Ptc2 on phase separation of Atg1 
complex (Fig. 3d). Addition of ATP promptly dissolved Atg1-complex 
droplets. Further addition of recombinant Ptc2 (Extended Data Fig. 3g, 
h) gradually regenerated the droplets, indicating that Ptc2-mediated 
dephosphorylation promotes phase separation of the Atg1 com- 
plex. Upon ATP addition, phosphorylation of Atg]1 at residue Thr226, 
and shortly afterwards, of Atg13 at Ser428 and Ser429 (Ser428/429) 
occurred. Thr226 and Ser428/429 were then dephosphorylated after 
addition of Ptc2 (Fig. 3e). As Ser428/429 of Atg13 is the most critical 
phosphorylation site for inhibiting PAS formation", these results sug- 
gest that Atg] and Ptc2 regulate the reversible phase separation of the 
Atgl complex through phosphorylating and dephosphorylating Atg13 
Ser428/429. Notably, dephosphorylation at Atg13 Ser428/429 is faster 
than that at Atg1 Thr226 (Fig. 3e, 30-70 min). We thus conclude that 
Ptc2-mediated dephosphorylation is one mechanism by which phase 
separation of the PAS is maintained while simultaneously retaining a 
subpopulation of Atg1 in an activated state. 


Dynamic structure of scaffold droplets 


We next performed structural analysis of the scaffold droplets using 
high-speed atomic force microscopy (HS-AFM). Molecules with an 
S-shape, a distinctive feature of Atg17”’, are distributed irregularly 
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Fig. 4| The surface structure of scaffold droplets is irregular and dynamic. 
a,b, Visualization of S-shaped Atg17 molecules in scaffold droplets on non- 
coated (a) and 3-aminopropyltriethoxysilane-coated (b) coverslips using HS- 
AFM in combination with fluorescence microscopy. Broken lines indicate the 
position of the cantilever used for HS-AFM observation. Inset shows Atg17 


throughout these droplets, confirming that droplets are not regularly 
structured (Fig. 4a). S-shaped Atg17 molecules exhibited dynamic 
behaviour ina restricted area within the droplets (Supplementary 
Video 7), providing further evidence that the droplets are in a liquid- 
like state. By contrast, Atg17 was arranged in a regular pattern and 
showed little movement in droplets that were loaded ontoa positively 
charged coverslip (Fig. 4b, Supplementary Video 8), suggesting that 
the droplets can mature toastatic, solid-like structure, depending on 
the environment. When FRAP experiments were performed on scaffold 
droplets, we observed that photobleaching impaired coalescence of 
droplets and that Atg17 fluorescence rarely recovered (Extended Data 
Fig. 4). Collectively, these observations indicate that the droplets are 
liquid-like and randomly structured, and that the droplets eventually 
mature into a static, ordered state in vitro, as has been observed in 
other biomolecular condensates that transition from liquid droplets 
to solid-like states such as gels, glasses and amyloids”*. 


Invitro reconstitution of the early PAS 


PAS formation occurs on the vacuolar membrane. Previous studies 
indicate that the autophagy-related vacuolar membrane protein Vac8 
interacts directly with Atg13”. Monitoring of GFP-Atg1, Atg5-GFP and 
GFP-Atg8 during starvation showed that puncta containing these 
proteins were mostly attached to the vacuolar membrane in wild-type 
cells, whereas about half were detached from the vacuolar membrane 
in vac8A cells (Fig. 5a, b, Extended Data Fig. 5a, b). This suggests that 
Vac8 is at least partly responsible for tethering the PAS to the vacuolar 
membrane. We next investigated whether Atg1-complex droplets could 
also be tethered to membranes via Vac8 using giant unilamellar vesicles 
(GUVs). Atg1-complex droplets were tethered to Vac8-anchored GUVs 
but not to GUVs lacking Vac8 (Fig. 5c, d, Extended Data Fig. 5c); Atg1 
was dispensable for droplet binding to Vac8 GUVs (Fig. 5e). The scaffold 
droplets were only rarely tethered to GUVs in the presence of the Vac8 
mutant that lost the affinity with Atg13”°, further confirming that tether- 
ing occurs througha specific Atg13-Vac8 interaction (Fig. 5f, Extended 
Data Fig. Sd, e). The number of droplets tethered to wild-type Vac8 GUVs 
was rapidly reduced, while their size increased through coalescence 
events (Fig. 5g,h, Extended Data Fig. 5f, Supplementary Video 9). This 
result is consistent with our in vivo and in vitro findings (Fig. If, g, 2c), 
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fast Fourier transform-bandpass filter. Experiments were repeated 
independently three times with similar results (a, b). a, b, Scale bars, 5 um 
(left), 100 nm (a, right) and 200 nm (b, right). 


revealing that the droplets remain in a liquid-like state, even in GUVs. 
FRAP experiments revealed that Atg1-complex droplets tethered to the 
membrane exchanged 60-100% of their constituent Atg1 molecules 
within 3 min (Extended Data Fig. 5g), providing further evidence for 
the liquid-like nature of droplets, even when on membranes. On the 
basis of these observations, we conclude that a structure similar to 
the early PAS was reconstituted in vitro using purified proteins and 
synthetic liposomes, demonstrating that the early PAS is a liquid-like 
condensate that is tethered to the vacuolar membrane through a spe- 
cific protein-protein interaction. 


Discussion 


The relationship between biomolecular condensates and autophagy 
is generally thought of in passive terms: condensates are targeted 
for degradation by autophagy”. Our results challenge this notion, 
instead suggesting that the PAS—the central driver of the autophagy 
mechanism—is a liquid-like biomolecular condensate (summarized in 
Extended Data Fig. 6). Phase separation is implicated at a fundamental 
level in PAS formation, with the interactions between IDR-containing 
Atgl-complex components critical for the early PAS. In previous work, 
extensive structural analyses have been performed on the Atg1 complex 
that have established the stoichiometry of Atg1-Atg13, Atg13-Atg17 
and Atg17-Atg29-Atg31 interactions’°”. However, in these studies, the 
majority of IDRs were removed for technical reasons, preventing the 
observation of Atg1-complex phase-separation events. Here we reveal 
that cross-linking of Atg17 dimers by the long IDR of Atg13 is the main 
mechanism of phase separation, a finding supported by a previous 
structural study describing specific and multivalent Atg13-Atg17 inter- 
actions®. The resulting liquidity of the PAS is critical for its function in 
dynamic recruitment of Atg proteins throughout autophagosome for- 
mation: for example, liquidity probably results in the concentration and 
activation of Atg1 kinase for autophagy initiation, and would facilitate 
the incorporation of Atg9 vesicles, the initial source of autophagosomal 
membranes’, ina manner reminiscent of liquid-phase synapsin cluster- 
ing of vesicles at synapses”. For these functions, the liquidity of the 
PAS—whichis easily lost by maturation, as observed in vitro (Fig. 4)—is 
maintained in cells through formation and dissolution events that are 
mediated by a combination of kinases and phosphatases. 
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Fig. 5| The liquid-like PAS is tethered to the vacuolar membrane via specific 
protein-protein interactions. a, Observation of the PAS by monitoring GFP- 
Atg1lin wild-type and vac8A cells treated with rapamycin for 3 h. Broken circles 
indicate the vacuole. Scale bar, 5 um. b, Proportion of GFP-Atg1 dots observed 
adjacent to the vacuole. Data are mean +s.d. (n=3 (b), 20 (d) and 30 (f) 
independent experiments). *P= 0.0126, two-sided t-test. c, Tethering of Atg1- 
complex droplets toa Vac8-anchored GUV. d, Quantification of the number of 
Atgl-complex droplets tethered to GUVs after equilibrium. ****P=1.3 x10™°, 
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Methods 


No statistical methods were used to predetermine sample size. The 
experiments were not randomized. The investigators were not blinded 
to allocation during experiments and outcome assessment. 


Yeast strains and media 

S. cerevisiae strains and plasmids used in this study are listed in Sup- 
plementary Tables 1 and 2, respectively. Standard protocols were used 
for yeast manipulation”. Cells were cultured at 30 °C in nutrient-rich 
SD + CA medium (0.17% yeast nitrogen base without amino acids and 
ammonium sulfate, 0.5% ammonium sulfate, 0.5% casamino acids, and 
2% glucose) supplemented with appropriate nutrients. Autophagy was 
induced by transferring cells to nitrogen-starvation SD(-N) medium 
(0.17% yeast nitrogen base without amino acids and ammonium sulfate, 
and 2% glucose) or by treating cells with 0.5 pg ml rapamycin (Sigma- 
Aldrich). Treatment of yeast cells with 1,6-hexanediol was performed 
by adding 1,6-hexanediol and digitonin at a final concentration of 10% 
and 10 pg mI, respectively, to the medium”. For replenishing nutri- 
ents to starved cells, an equal volume of 2x SD + CA medium (doubled 
concentration of each constituent) was added to culture media. 

For galactose induction of GFP-Atg13 protein using the GALI pro- 
moter, cells were cultured at 30 °C in nutrient-rich SD + CA medium 
(0.17% yeast nitrogen base without amino acids and ammonium sulfate, 
0.5% ammonium sulfate, 0.5% casamino acids, 2% raffinose and 0.1% 
glucose) supplemented with appropriate nutrients. The following day, 
the cell culture (OD,o,=1.0-2.0) was supplemented with galactose toa 
final concentration of 2%, and cultured for an additional 6 h. Autophagy 
was induced by treating cells with 0.5 pg ml rapamycin for 10 min. 


Construction of expression plasmids 

The pRS316-based low-copy plasmid for expression of GFP-Atg1 and 
GFP-Atg13 in yeasts under the control of the A7G1 and GPD promoters, 
respectively, were constructed as described previously’®. The pRS426- 
based multi-copy plasmid for expression of GFP-Atg13 in yeasts under 
control of the GPD promoter and GAL1 promoter, respectively, were also 
constructed similarly. To construct coexpression plasmids encoding 
SNAP-tagged Atg17, Atg29 and hexahistidine (His,)-tagged Atg31, genes 
were amplified by PCR and cloned into the pET28a (+) vector (Novagen) 
for SNAP-tagged Atg17 and the pACYCDuet-1 vector (Novagen) for 
Atg29 and His,-tagged Atg31. To construct expression plasmids encod- 
ing N-terminal His,-tagged and C-terminal SNAP-Twin-Strep-tagged 
Atg13, genes were amplified by PCR and cloned into the pET11a vector. 
To construct expression plasmids encoding N-terminal glutathione- 
S-transferase (GST)-tagged and C-terminal SNAP-tagged Vac8, genes 
were amplified by PCR and cloned into the pGEX6p-1 vector. To con- 
struct expression plasmids encoding N-terminal GST-tagged Ptc2, 
genes were amplified by PCR and cloned into the pGEX6p-1 vector. To 
construct expression plasmids encoding the N-terminal SNAP-tagged 
Atg1 with a HRV3C protease site followed by Flag and His, tags, the 
SNAP-tag gene was amplified by PCR and cloned into the pFastBac 
Dual-based Atg1 expression vector®. The NEBuilder HiFi DNA Assembly 
Cloning Kit (New England Biolabs) was used for cloning. Mutations 
to generate the indicated amino acid substitutions were introduced 
by PCR-mediated site-directed mutagenesis. All constructs were 
sequenced to confirm accuracy of cloning. 


Protein Expression and Purification 

E. coli strain BL21(DE3) cells were used for expression of all recombi- 
nant proteins except Atg1. His,-tagged Atg31 was coexpressed with 
SNAP-Atg17 and Atg29. After cell lysis, the SNAP-Atg17-Atg29-Atg31 
complex was purified by affinity chromatography using a Ni-NTA col- 
umn (Qiagen). After affinity chromatography, the protein complex 
was purified on a HiLoad 26/60 Superdex 200 PG column (GE Health- 
care) eluted with 20 mM Tris-HCl pH 8.0 and 150 mM NaCl. N-terminal 


His,-tagged and C-terminal SNAP-Twin-Strep-tagged Atg13 was first 
purified with a Ni-NTA column and then purified using a Strep-TactinXT 
resin column (IBA Lifesciences). Finally, the proteins were purified ona 
HiLoad 26/60 Superdex 200 PG column eluted with 20 mM HEPES pH 
7.0 and 500 mM NaCl. GST-Vac8-SNAP was first purified using a glu- 
tathione-Sepharose 4B (GS4B) column (GE Healthcare). After affinity 
chromatography, GST was excised using human rhinovirus 3C protease. 
Vac8-SNAP was again applied to a GS4B column in order to remove 
the excised GST. Finally, the protein was purified on a HiLoad 26/60 
Superdex 200 PG column eluted with 20 mM Tris-HCl, pH 8.0 and 150 
mM NaCl. GST-Ptc2 was first purified using a glutathione-Sepharose 
4B (GS4B) column (GE Healthcare). After affinity chromatography, GST 
was excised using human rhinovirus 3C protease. Ptc2 was again applied 
toaGS4B column inorder to remove the excised GST. Recombinant Atg1 
was expressed using the baculovirus expression system (Invitrogen) 
and then purified as described previously®. SNAP tag of Atg1, Atg13, and 
Atgl17 was labelled with SNAP-Surface Alexa Fluor 488, SNAP-Surface 
549, and SNAP-Surface Alexa Fluor 647 (all from New England Biolabs), 
respectively, according to the manufacturer’s protocol, except for the 
samples used for AFM experiments, where the SNAP tag of Atg17 was 
labelled with SNAP-Surface Alexa Fluor 488. For GUV experiments, 
proteins were dialysed against 20 mM HEPES pH 7.0 and 500 mM NaCl 
using dialysis tubes, 8 kDa cut-off (GE Healthcare). 


FRAP measurements and analysis 

For FRAP experiments assessing the PAS, cells treated with rapamycin 
were imaged on concanavalin A coated glass-bottom dishes (Mattek) 
to immobilize cells. For FRAP experiments of Atgl-complex drop- 
lets attached to a giant liposome, multilamellar liposomes instead 
of GUVs were used in order to reduce the movement of droplets on 
the liposome. During FRAP experiments, Atg1 sample was continu- 
ously added in the vicinity of the liposome using a micropipette. FRAP 
experiments were carried out with a FV300ORS confocal laser scanning 
microscope (Olympus) equipped with an UPLSAPO60XO, NA 1.42 Oil 
objective (Olympus). For imaging of GFP and SNAP-Surface Alexa Fluor 
488 fluorescence, excitation was performed using a488-nm laser and 
fluorescence was recorded in a linear sequential mode using a galvano 
scanner to capture one z-stack for the PAS and 6.9-m z-stacks with 
1.4-~m spacing for the liposome-tethered droplets. Photobleaching 
was performed using 405-nm and 488-nm laser pulses (1 repeat, 10% 
intensity, dwell time 5-50 ms) for the PAS and 488-nm laser pulses (1 
repeat, 10% intensity, dwell time 45 ms) for the liposome-tethered 
droplets. Image analysis was carried out with FIJI v.1.52e” or FV31S-SW 
v.2.1.1.98 (Olympus). For kinetic analysis, relative fluorescence intensity 
was plotted against time by setting the intensity before quenching as 
1.0 and the minimum intensity after quenching as 0.0, and fitted to 
an exponential recovery curve: F=A,(1—- exp(K g(t) in which Ag is the 
maximum recovery at t= infinity and tis time in seconds. This equation 
was used to determine ko. The ¢,. value was calculated as In(2)/Ko¢p. 


Fluorescence correlation spectroscopy 

The FCS data shown in Fig. 1 are taken with a TCS SP5 II confocal micro- 
scope (Leica). An HC PLAPO 40x/1.10 W CORR CS2 objective lens (Leica) 
was used for imaging, and the signal was detected witha hybrid detector 
(HyD, Leica) ina photon-counting mode, with the scanning frequency 
f,=8,000 Hz, andthe number of pixels €,,,, =16. The size of the detection 
area was S,=403 nm, comparable to the size of the diffraction-limited 
point spread function (PSF) of the observation lens (w,,=290 nm, and 
w,=1,450nm). 

Further, we employed scanning FCS (sFCS)*” to minimize pho- 
tobleaching and the effect of movement of the PAS itself. 

In sFCS, a single line scanning data set was taken with a scanning 
frequency f,. The intensity of each pixel €(1< €< €,,,,) isthe integration 
of fluorescent signals for a pixel dwell time 7,. The illumination volume 
moves from the leftmost position €=1 to the rightmost position €,,,,, 


3? 


and returns from €=€,,,, to 1. The scanner repeats the same procedure 
every T,=1/f,seconds. For each pixel, the intensity time traces F,(¢;) at 
time ¢; (i= 0,1,2,...,2n — 1) are used to calculate the temporal autocor- 
relation curve (where n denotes the number of scans): 


( BF e() F(t; + Te)) 


(FaepFe(t.+ %)) 


where the angled brackets denote the average over along time period 
at each pixel position €. The fluorescence of fluctuation is 
OF (t;) = Fe(t,) — (Fe(t))) and the lag-time is T;. 

The calculated autocorrelation function G(z) was fitted similarly as 
the conventional fixed point FCS with a 3D one-component model: 
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where yis a geometric factor dependent on the shape of the focal vol- 
ume (0.35 for 3D Gaussian), Nis the number of molecules in the focal 


3/2 
volume, as calculated by N= (C)wiyw, (5) (where (C) denotes the 


average concentration of molecules in the PSF), Dis a diffusion coef- 
ficient in pm? s™, w,,and w, are the radial and axial waist of the PSF. We 


3/2 : 
usually calculate the confocal volume as V,,n¢= (5) WW, ,enabling 


us to interpret the factors w7,w,1*/7as effective volume V.., where the 
particles are actually detected for sources of fluorescence. 

However, the original sFCS itself was based on the slow Galvo scan- 
ner, which limited its applications to slowly diffusing molecules. We 
therefore used a Galvo-resonant scanner for high-speed scanning. 
Furthermore, slowly fluctuating noise was problematic during meas- 
urement of living cells due tothe movement of subcellular structures. 
Such noise was statistically removed by the wavelet-based method 
described in the following section. 


Noise removal and data correction for FCS analysis 
For correction of photobleaching effects, we averaged the intensity 
time trace over all pixels F(t) = (Fe(¢))eand approximated it as an expo- 


nential decay curve F(t,) - f(t) =f,e (-5) . When we assume that the 
intensity F,(t,) obeys a Poisson distribution with mean y,=/(¢,) and 
variance a? = f(t), we can correct the standard deviation of the intensity 
and photobleachas F¢(¢,) = (0) + (Fe(G) — H,)09/0; and we have the cor- 
rected intensity F¢(¢)) as in ref. *: 
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For further correction, we applied wavelet-based smoothing*** to 
subtract the inhomogeneous intensity fluctuation Fe(t) whichis cal- 
culated as follows. We applied a wavelet decomposition of the intensity 
signal F¢(¢;) which can be achieved by using a scale function @ anda 
wavelet function win multiresolution analysis. In this paper, we used 
the Haar scaling and wavelet functions as 


(a) = 


_| 1 (@st<1) 
90) {9 otherwise 
1, (O<t<3) 

W(t) =) _4 (G<t<) 


O, otherwise 


Ifwe represent the discrete intensity signal F¢(¢,) = {y), y,, --» J,_,3with 
the lengthn=2/(/>0), the discrete wavelet transform gives the vector 
of wavelet coefficients with the length n. The coarse approximation of 
the signal is represented by a linear combination of the shifted scale 
functions @;, (0) =2/7@ (2/t- k), wherejand kare the scale and location 
of the scale function, respectively. The weights for this function are 
scale coefficients c; ,. The residual details of the signal which are omit- 
ted from the coarse approximation can be expressed by a linear com- 
bination of the shifted wavelet functions y;, (t) =2/y (2/t- k) with the 
weights d;,. If the coarse approximation level Z </is given, the signal 
is represented by the wavelet coefficients with 2! scale and 2/ detail 
coefficients, {C) 9, Cy 1, ++ Cy 9t-pANd {dj 9, dj 1, +. dj r/-pU=L,...J-D, 
respectively. 

In general, wavelet methods have been known to be advantageous 
for statistical analyses including the removal of noise when the signal 
isinhomogeneous in time. For example, when aseries of inhomogene- 
ous signal includes a long-term variation and noise, the variation can 
be captured by large wavelet coefficients and the noise by small coef- 
ficients. The successful removal of noise can be achieved by appropri- 
ate threshold selection. However, in FCS measurements for living cells, 
along variation is usually caused by cell or organelle movement and 
by noise arising from intensity fluctuation caused by molecular diffu- 
sion within a confocal volume. Therefore, we use the term ‘fluctuation 
signal’ for such ‘noise’. To derive the fluctuation signals of molecular 
diffusion, we first applied discrete wavelet transform of the original 
signal Fat) = Ip oe ae and removed noise (fluctuation signals) 
by setting the detail coefficients d,, to 54 (d;,0) or dy (d;,4)» where oy” 
are the soft or hard thresholding functions, respectively, 


x-A,, (x>A;) 
Six) =40, (IA) 
xt+A,, (x<-A)) 


x, (Ixl>A,) 
O, otherwise 


y= 


Because the intensity signals obey Poisson distribution, the values of 
A,are level-dependent thresholds for Poisson noise, and the translation- 
invariant Poisson smoothing using Haar wavelets (TIPSH) algorithm 
was used*, 

Second, we applied inverse wavelet transform of the wavelet coef- 
ficients and to determine the estimated long-term variation 
Fe(t) = HI» sent y, 1} The fluctuation signals are calculated by sub- 
tracting this variation from the original inhomogeneous signals 
(F§(t) - Fe (t))- Finally, the absolute values of fluctuation signals can 
be calculated by adding the time average of the variation Fe (¢), giving 


FEM (t) =Fe(t) + (Fe(t) - Fe (t)) 


We have implemented the above method with the Python pro- 
gramming language (Python Software Foundation, Python v.3.6)**. 
The autocorrelation function after noise removal was calculated by 
a multiple-tau correlation algorithm”, followed by a nonlinear least- 
squares fitting to the theoretical model. The radial and axial waist of 
the PSF were determined by measuring fluorescent solutions and fixed 
values on fitting. 


Reconstitution and microscopy of protein-rich droplets 

Liquid droplets of the Atg13-Atg17-Atg29-Atg31 complex and the Atg1 
complex were formed by dilution of proteins froma stock solution into 
buffer as follows; In Fig. 2b, liquid droplets of 3.3 uM Atg1 complex 
(SNAP-Atg1(D211A), Atg13-SNAP, SNAP-Atg17-Atg29-Atg31) were 
formed by dilution of proteins from a stock solution into buffer (final 
concentrations: 50 mM HEPES, pH 7.0, 250 mM NaCl) with subsequent 
incubation at 25 °C for the indicated times. In Fig. 2c, liquid droplets of 
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7.5 uM Atg1 complex (Atg1(D211A), Atg13-SNAP, SNAP-Atg17-Atg29- 
Atg31) were formed by dilution of proteins from a stock solution into 
buffer (final concentrations: 50 mM Bis-Tris-HClI, pH 5.5,400 mM NaCl) 
at 25 °C. In Fig. 2e, 4.8 uM Atg1 complexes (SNAP-Atg1(D211A), Atg13- 
SNAP, SNAP-Atg17-Atg29-Atg31) were formed by dilution of proteins 
froma stock solution into buffer (final concentrations: 50 mM MES, 
pH6.0,300 mM NaCl) with subsequent incubation for 30 min at 25 °C. In 
Fig. 2g and Extended Data Fig. 2h, liquid droplets of 6 uM Atg13-Atg17- 
Atg29-Atg31 containing the indicated mutations were formed by dilu- 
tion of proteins froma stock solution into buffer (final concentrations: 
50 mM HEPES, pH 7.0, 250 mM NaCl) with subsequent incubation at 
25 °C for the indicated time. In Extended Data Fig. 2d, liquid droplets 
of 6uM Atg13-Atg17-Atg29-Atg31 were formed by dilution of proteins 
froma stock solution into buffer (final concentrations: 50 mM HEPES, 
pH 7.0, 250 mM NaCl) with subsequent incubation for 10 min at 25 °C. 
Next, 1,6-hexanediol or buffer was added at a final concentration of 5% 
to liquid droplets at 25 °C. In Extended Data Fig. 2i, liquid droplets of the 
indicated concentration of Atg13-Atg17-Atg29-Atg31 were formed by 
dilution of proteins froma stock solution into buffer (final concentra- 
tions: 50 mM MES, pH 6.0, 400 mM NaCl) with subsequent incubation 
for 3 minat 25 °C. Phase separation was scored ‘droplet’ or ‘no droplet’, 
depending on the presence or absence of protein droplets. In Fig. 3b, 
liquid droplets of 2.7 uM Atg13-Atg17-Atg29-Atg31 containing either 
phosphorylated Atg13 or non-phosphorylated Atg13 were formed by 
dilution of proteins froma stock solution into buffer (final concentra- 
tions: 50 mMHEPES, pH 7.0, 250 mM NaCl) with subsequent incubation 
at 25 °C for the indicated times. 

These samples were mixed in a microtube and imaged on a glass- 
bottom dish (Mattek) coated with 3% bovine serum albumin (BSA) 
(Wako). AFV3000RS was used for fluorescence imaging. 488-nm, 561- 
nm, and 640-nm lasers were used for excitation of Atgl labelled with 
SNAP-Surface Alexa Fluor 488, Atg13 labelled with SNAP-Surface 549, 
and Atg17 labelled with SNAP-Surface Alexa Fluor 647, respectively, and 
fluorescence was recorded in linear sequential mode using a galvano 
scanner. Quantitative analysis was carried out with Fiji. 


In vitro pull-down assay 

Purified proteins were incubated with GST-accept beads (Nacalai 
Tesque) at 4 °C for 30 min. After the beads were washed three times 
with PBS, proteins were eluted by 10 mM glutathione in 50 mM Tris-HCl 
(pH 8.0). The samples were separated by SDS-PAGE. Protein bands 
were detected by One Step CBB (BIO CRAFT). 


Phosphorylation of Atg13 by TORC1 

Purification of TORCI from yeast was performed as previously 
reported”. In the final step of purification, TORC1 was eluted with elu- 
tion buffer (250 ng pl! Flag peptide (Sigma, F3290), 100 mM NaCl, 31 
mM Tris-HCl, pH 7.5). Atg13 (3 uM) was phosphorylated by TORC1 in 
reaction buffer (1 mM ATP, 1 mM MgCL,, 1x protease inhibitor cocktail 
(Nacalai, 03969-21), 1mM PMSF, 100 mM NaCl, 50 mM Tris-HCl, pH 7.5) 
for 17 hat 20 °C. After the reaction, NaCl was added toa final concentra- 
tion of 500 mM. Dilution with buffer (20 mM HEPES, pH 7.0, 500 mM 
NaCl) and concentration were repeated for phosphorylated Atg13 to 
exchange the buffer. Samples were separated by SDS-PAGE and Zn”*- 
Phos-tag SDS-PAGE (Wako). Zn**-Phos-tag SDS-PAGE was performed 
using 20 pM Phos-tag solution. Protein bands were detected by One 
Step CBB. For western blotting, protein bands were detected by C-DiGit 
Blot Scanner (LI-COR Biotechnology). 


Phosphorylation of the Atg1 complex 

For Extended Data Fig. 3f, liquid droplets of 3.3 uM Atg1 complex con- 
taining either wild-type Atg1 or Atg1(D211A) were formed by dilution 
of the protein from a stock solution into buffer (final concentration; 
50 mM HEPES, pH 7.0, 250 mM NaCl) and then incubated for 10 min 
at 25 °C. Next, one-tenth volume of ATP solution (10 mM ATP, 10 mM 


MgCl,,10 mM HEPES, pH 7.0, 250 mM NaCl) was added to liquid droplets 
at 25 °C. Samples were then collected at the indicated time points. The 
samples were separated by SDS-PAGE and detected by One Step CBB. 
Fluorescence of the same samples was imaged using FV300ORS on 3% 
BSA coated glass-bottom dishes. Turbidity was measured by sample 
optical density at 350 nm using a NanoDrop 2000 (Thermo Scientific). 


Autophosphorylation assays 

For confirming the specificity of anti-T226-P antibodies (Extended 
Data Fig. 3b), ATP (—) samples were prepared by incubating the Atg1 
complex containing either wild-type or T226A of SNAP-Atg1-Flag- 
His, in 50 mM MES, pH 6.0, 150 mM NaCl for 30 min at 25 °C. ATP (+) 
samples were prepared by incubating the same protein complex in50 
mM MES, pH 6.0, 150 mM NaCl for 30 min at 25 °C followed by incuba- 
tion with 1mMATP-Mg inthe same buffer solution for 30 min at 25 °C. 
The samples were separated by SDS-PAGE and subjected to western 
blotting using anti-T226-P and anti-Flag antibodies. For Fig. 3c, liquid 
droplets of 0.2 uM Atg1 complex containing either wild-type Atg13 or 
Atg13(F430A) were formed by incubation in 50 mM MES, pH 6.0, 300 
mM NaCl for 30 min at 25 °C. Asample comprising Atg1 alone was also 
prepared in the same buffer. Next, the ATP-Mg solution was added to 
samples at a final concentration of 1 mM at 25 °C. Samples were then 
collected at indicated time points. Samples were separated by SDS- 
PAGE and subjected to western blotting. Protein bands were detected 
by C-DiGit Blot Scanner (LI-COR Biotechnology). Quantitative analyses 
were carried out using Fiji. 


Dephosphorylation of the Atg1 complex 

For Fig. 3d, e, liquid droplets of 2 1M Atg1 complex were formed by dilu- 
tion of the protein froma stock solution into buffer (final concentration; 
50 mMHEPES, pH 7.0, 250 mM NaCl) with subsequent incubation for 10 
minat 30 °C. Next, the solution containing Atg1-complex droplets was 
supplemented with one-tenth volume of the ATP solution (10 mM ATP, 
10 mM MgCl,, 10 mM HEPES, pH 7.0 and 250 mM NaCl) and incubated 
for 20 min at 30 °C. Finally, the solution was supplemented with Ptc2 
and MnCl, to final concentrations of 1 1M and 3 mM, respectively, and 
incubated for 40 min at 30 °C. Samples were collected at the indicated 
time points for further analysis. 


Antibodies 

Polyclonal antibodies against S. cerevisiae Atg1 with phosphorylation 
at Thr226 (anti-T226-P antibody) were raised according to a previ- 
ous report’’. Antibodies were raised in rabbits against a phosphoryl- 
ated peptide, FLPNTSLAE[pThr]LCGSPLY, which corresponds to the 
sequence of the activation loop of Atg1. ELISA was performed using 
the peptides with and without phosphorylation at Thr226 to confirm 
the specificity of antibodies against the phosphorylated peptide. The 
peptide synthesis, antibody generation, ELISA, and purification were 
performed by GenScript. Polyclonal antibodies against S. cerevisiae 
Atg13 with phosphorylation at Ser428 and Ser429 (anti-S428/9-P anti- 
body) were generated as described ina previous study”. Anti-Flag M2 
antibody was purchased from Sigma (F3165). Anti-HA antibody was 
purchased from MBL (M180-3S). Anti-mouse IgG (Fab specific)—per- 
oxidase antibody produced in goat was purchased from Sigma (A9917). 
Anti-rabbit IgG (whole molecule)-peroxidase antibody produced in 
goat was purchased from SIGMA (A6154). 


Sample preparation for HS-AFM observation 

For HS-AFM imaging, coverslips (24 x 32 mm, 0.13-0.17 mm thick) (Mat- 
sunami Glass) were used as a solid support. Coverslips were immersed 
in5 MKOH solution for 1h, followed by three washes in Milli-Q water. 
Cleaned coverslips were subsequently sonicated in Milli-Q water 
for 20 min and stored in ethanol at 4 °C until use. Ethanol was com- 
pletely eliminated before each experiment. Atg13—SNAP (1 uM) and 
SNAP-Atg17-Atg29-Atg31 (1 1M) were mixed in 20 pl of observation 


buffer (250 mM NaCl, 20 mM HEPES-NaOH, pH 7.0) and deposited 
onto cleaned coverslips. After a 5-min incubation, excess proteins were 
washed out with observation buffer. For observation of data presented 
in Fig. 4b and Supplementary Video 8, the coverslip was treated with 
0.02% 3-aminopropyltriethoxysilane for 5 min and then washed in 
MilliQ-water before deposition of the protein mixture. 


HS-AFM imaging 

HS-AFM images were acquired in tapping mode using a tip-scan type 
HS-AFM instrument®® (Nano Explorer PS-NEX, Research Institute of Bio- 
molecule Metrology Co.) equipped witha fluorescence microscope. We 
used cantilevers measuring ~9 um long, ~2 um wide and ~0.13 pm thick 
witha resonant frequency of ~1.5 MHzanda spring constant of 0.1-0.2 
Nm (BL-ACIODS, Olympus). Scaffold droplets were selected for obser- 
vation by fluorescence imaging using SNAP-Surface 549 labelled to 
Atg13-SNAP and Alexa Fluor 488 labelled to SNAP-Atg17 and located 
to the HS-AFM scanning area (~4 x ~6 um’) before nanoscale imaging 
with HS-AFM. HS-AFM imaging conditions were as follows: scan size, 
500 x 250 nm? (Fig. 4a and Supplementary Video 7) or 1000 x 500 nm” 
(Fig.4b and Supplementary Video 8); pixel size, 120 x 60 pixels; imaging 
rate, ~3.1 frames per s (fps). Allimaging was performed at 23 °C. IGOR 
Pro (WaveMetrics) based software for HS-AFM was used to process the 
image, using features such as Gaussian filtering, automatic flattening 
and fast Fourier transform-frequency filtering”. 


Preparation of GUVs 

The natural swelling method is used to prepare GUVs containing PEMCC 
ina buffer froma dry lipid film*°. To prepare GUVs using electrically 
neutral lipids in a buffer of normal physiological ion concentration 
(-150 mM NaCl), we used a very small fraction of PEG-lipid (thatis, the 
PEG-lipid method)*°“". We prepared 200 pl of al mM phospholipid 
mixture consisting of 1-palmitoyl-2-oleoyl-sn-glycero-3-phosphocho- 
line (POPC), 1-palmitoyl-2-oleoyl-sn-glycero-3-phosphoethanolamine 
(POPE), 1,2-dioleoyl-sn-glycero-3-phosphoethanolamine-N-[4-(p- 
maleimidomethyl)cyclohexane-carboxamide] (PEMCC), and 1,2-dipal- 
mitoyl-sn-glycero-3-phosphoethanolamine-N-[methoxy(polyethylene 
glycol)-2000 (DPPE-PEG2000) (all lipids were purchased from Avanti 
Polar Lipids) at a molar ratio of 69:20:10:1in chloroform ina5-ml glass 
vial, and then produced a homogeneous thin film of lipid mixture by 
evaporation of chloroform using the gentle application of nitrogen 
gas. For the complete removal of chloroform, we then placed the glass 
vial in a vacuum desiccator connected to a rotary pump overnight. 
The following day, we prehydrated the thin lipid film on the bottom 
of the glass vial using 20 pl of water at 60 °C for 7 min. Thereafter 1 ml 
of HEPES buffer (20 mM HEPES, pH 7.0, 150 mM NaCl and 1mM EGTA) 
containing 0.1 M sucrose was added and samples were incubated for 
2-3 hat 60 °C. 


Observation of GUVs 

For dilution, 200 pl of GUV solution was added to 800 pl of HEPES 
containing 0.1M glucose solution (external solution) into a hand-made 
microchamber that was formed ona glass slide (Matsunami Glass) by 
depositing (in parallel) two bar-shaped silicon-rubber (3-mm silicon 
sheet) spacers between a cover slip (Micro cover glass, Muto Pure 
Chemicals) and the glass slide**. The microchamber was coated with 
0.10% (w/v) BSA prepared in the same buffer used in experiments to 
avoid strong contact of GUVs with the glass surface. To increase the 
contrast of the GUVs for DIC observation, the interior and exterior 
of GUVs were filled with 0.1M sucrose or 0.1M glucose, respectively. 
Observation of GUVs was performed using a FV300ORS as described 
above at room temperature (~23 °C). 


Single GUV method for observation of GUV- protein interactions 
The single-giant unilamellar vesicle (GUV) method, by which proteins 
can be precisely added one by one in the vicinity of a GUV, was used for 


observation of GUV- protein interactions. Purified proteins in HEPES 
buffer containing 0.1 M glucose were added slowly one by one into 
the vicinity of a single GUV through a 12-15 pm diameter glass micro- 
pipette, the position of which was controlled by a micromanipulator 
(Narishige) at room temperature”. The distance between the GUV 
and the tip of the micropipette was maintained at ~50 um. The glass 
micropipette was prepared as follows: first we pulled a glass tube of 
1.0-mm diameter toa needle point using a puller, and the needle point 
was then microforged to the desired tip diameter (all equipment from 
Narishige). Proteins in the external solution of GUVs were filled in the 
micropipette by aspiration using a vacuum pump (ULVAC KIKO), and 
then the micropipette was held by a micromanipulator, enabling us fine 
control over tip positioning in the vicinity of the GUV. Protein applica- 
tion pressure in the vicinity of the GUV was controlled by changing the 
height of a vertical column of water to which the micropipette was 
hydraulically connected*’. The application pressure was measured 
using a differential pressure transducer (Validyne), pressure amplifier 
(Karone), anda digital multimeter. For the constant application of pro- 
teins inthe vicinity of GUVs using the micropipette, we first determined 
the equilibrium pressure by bringing the tip of the micropipette near to 
asmall vesicle and adjusting the pressure to keep the vesicle at the tip 
of the micropipette. After fixing equilibrium pressures, an additional 
~300 mV pressure was added to constantly apply protein solution in 
the vicinity of GUVs. 


Reporting summary 
Further information on research design is available in the Nature 
Research Reporting Summary linked to this paper. 


Data availability 


All relevant data are available from the authors. Source data for gels 
and blots are provided as Supplementary Information. Source Data 
for graphs are provided with the paper. 
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Extended Data Fig. 1|See next page for caption. 
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Extended Data Fig. 1| Characterization of the PAS using fluorescence 
microscopy. a, Measurement of alkaline phophatase activity of yeast cells 
overexpressing GFP-Atg13 performed based on previous reports“. Data are 
mean +s.d. (n=3 independent experiments). Bar colour indicates hours after 
onset of nitrogen starvation. b, Dissolution of Atg13-GFP puncta after addition 
of nitrogen source. Experiments were repeated independently twice with 
similar results. c, Two additional examples of GFP-Atg13 fluorescence 
recovery, related to Fig. 1a. Experiments were repeated independently three 
times with similar results, which are shown here and in Fig. 1a. d, Fluorescence 
of endogenously expressed Atg1-GFP, Atg13-GFP and Atg17-GFP puncta 
rapidly recovers after photobleaching. Rapa and —N indicate rapamycin 
treatment and nitrogen starvation, respectively. Experiments were repeated 
independently twice with similar results. e, Kkymograph of FCS datashownin 
Fig. 1b. f, Two additional examples of partial fluorescence recovery of giant 


GFP-Atg13 droplets, related to Fig. 1d. DIC of Fig. 1d experiment is also shown. 
Experiments were repeated independently three times with similar results, 
which are shown here and in Fig. 1d. g, Coalescence of GFP-Atg13 puncta 
observed after removing 1,6-hexanediol. The images are the sum of five z-slices 
of GFP fluorescent images. Experiments were repeated independently three 
times with similar results. h, Two additional examples of the coalescence of two 
PAS precursors, related to Fig. 1g. Experiments were performed three times 
with similar results, which are shown here and in Fig. 1g. i, An additional 
example of Ostwald ripening of the PAS, related to Fig. 1h, i. The images are the 
sum of four z-slices of GFP fluorescent images. The bottom graph shows the 
line profile of fluorescence intensity in the top image. Experiments were 
repeated independently twice with similar results, which are shown here and in 
Fig. li. Scale bars, 21m, exceptinf, 1pm. 


a 
Kinase MIT1 MIT2 
= ac 
HORMA 17LR 17BR MIM 
Ag tS 5 a aT 
Ag \ a, Ty ¢ 


Atg31 binding 
Atg29 N\ Tc [in| 


Atg17 binding 


Atgi Atgi3 Atg17-29-31 


450— 100— 


*~ 
100— 


we Atg17 
5 
50-4 
50— = 
me 50— 37 tgs 
37 = Atg29 


Atg31 NCE ¢ 2 
20— 
Cc 25 
a (s) 24 eee 25— aed 
135 240 SBA. 4 
Atg17 227 °°, 
— ie 184, . g 
3 16 ste, 
B44 ee ee. cause 
res gee 500 
ie Ss 
fi 50 100 150 200 = 400 
bar, 5 um Time (s) = 
e) 
% 300 
d e . 
sek “0 
Time (s) e 
100 
Ss 5.5 60 65 70 7.5 8.0 
s pH 
Buffer g 807 
h 
@ 6- Cle 
Qa 25 cs 
2 ~Atg13 WT-Atg17 WT 
hs ~Atg13 F375A—Atg17 WT 
Se) S20 5 Atg13 F430A-Atg17 WT 
Hexanediol 2 LS ~Atg13 WT-Atg17 D247A 
@ 20- 8 15 4 ~Atg13 WT-Atg17 P393A 
“ 10 
ar, 50 um 0 ootee 2 
1,6-Hexanediol Buffer 2 
254 
f 6 4 
0 t + = 
Atg13 Atg17 Merge 0 5 10 15 20 
Time (s) 
I c 
Atg13 
Dar, 10 am 
@ Droplet ONo droplet 
— a 
bar, 10 xm 2 Oo Oo e e 
j =|0 0 @¢@ 
After quench (s) ° 
N 
Before quench ¢) 60 210 @ © eo 
2 
GFP 2 “1! @ O O 
@ 
5 1 2. 4 
: - Atg13 (uM) 
5 ae ~Droplet1 
DIC ie 3 ; ~Droplet2 
S04 
2 
202 
io} 
@ ot : : - 


bar, 5um. 


Extended Data Fig. 2|See next page for caption. 
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Extended Data Fig. 2| Characterization of the Atg1-complex droplets 

in vitro. a, Domain organization of Atg1-complex components. Grey regions 
indicate IDRs consisting of ten or more residues predicted to be disordered by 
DISOPRED”. Bar length is approximately proportional to the number of 
residues. b, SDS-PAGE of purified SNAP-tagged proteins used for in vitro 
analyses. Experiments were repeated independently twice with similar results. 
For gel source data, see Supplementary Fig. 1.c, An additional example of 
coalescence of Atg1-complex droplets observed in vitro, related to Fig. 2c. The 
right panel shows the change of the aspect ratio during coalescence. 
Experiments were repeated independently twice with similar results, which are 
shown here and in Fig. 2c. d, Formation of liquid droplets of the scaffold 
complex and their dissociation by 1,6-hexanediol treatment. Experiments were 
repeated independently six times with similar results. e, Quantification of the 


residual droplet areaind. Data are mean +s.d. (n= 6 independent experiments). 
****P=3,9x10°°, two-sided t-test. f, The effect of pH on the formation of 
scaffold droplets. The concentrations of NaCl and Atg13-Atg17-Atg29-Atg31 
are 500 mMand4 uM, respectively. The experiment was repeated 
independently three times with similar results. g, Phase diagram of the 
formation of scaffold droplets at indicated NaCl concentrations and pH values. 
The protein concentration is 4 1M. Experiment was performed once. h, Time- 
course analysis of droplet area in Fig. 2g. Dataaremeants.d.(n=3 
independent experiments). i, Phase diagram of droplet formation upon mixing 
of Atg13 and Atg17-Atg29-Atg31 at indicated protein concentrations. 
Representative images at a, b and cin the diagram are shown above the 
diagram. Experiment was performed once. 
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Extended Data Fig. 3 | In vitro assay determination of Atg1 and Ptc2 
phosphoregulatory activity. a, Purification of TORC1 from yeast. The 
experiment was repeated independently twice with similar results. 

b, Confirmation of the specificity of the anti-T226-P antibodies. Experiment 
was performed once. c, Quantification of the results in Fig. 3c. Dataare 

mean ts.d. (n=3 independent experiments). d,e, Phosphorylation-mediated 
band shifts of Atg1, Atg13 and Atg29 upon incubation of the Atg1 complex with 
ATP analysed by conventional (d) and Phos-tag SDS-PAGE (e). Experiment was 


repeated independently three times with similar results. f, Effect of Atg1- 
mediated phosphorylation on Atg1-complex droplets. Bottom graph shows 
time-course analysis of droplet area. Data are mean+s.d.(n=3 independent 
experiments). g, SDS-PAGE of the recombinant Ptc2 used inthis study. 
Experiments were repeated independently twice with similar results. h, Mn**- 
dependent dephosphorylation of Atg1 and Atg13 by Ptc2. Experiment was 
performed once. For gel source data, see Supplementary Fig.1(a,b,d,e,g,h). 
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Extended Data Fig. 4 | In vitro observation of droplet maturation. droplets, respectively. Photobleaching was performed between 0 and3s. Non- 
Photobleaching experiments were performed for the scaffold dropletsinvitro. | bleached droplets coalesce, whereas bleached droplets do not. Experiments 
White and yellow arrow heads indicate photobleached and non-bleached were repeated independently twice with similar results. Scale bars, 5 pm. 
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Extended Data Fig. 5| See next page for caption. 
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Extended Data Fig. 5 | Droplets tethered to GUVs via specific Atg13-Vac8 
interaction retain their liquid-like nature. a, b, Deletion of VAC8 results in 
mislocalization of AtgS—-GFP puncta away from the vacuole (a) and GFP-Atg8 
puncta (b). Atg5-GFP and GFP-Atg8 were observed following 3 or 5h 
rapamycin incubation, respectively. Data are mean +s.d. (n=3 independent 
experiments). *P= 0.0119, **P=0.0044, two-sided t-test. c, Lack of tethering of 
Atg1-complex droplets to Vac8-free GUVs. Experiments were repeated 
independently 20 times with similar results. d, Impaired interaction of the Vac8 
mutant with Atg13 demonstrated by in vitro pull-down assay. Experiments 
were repeated independently three times with similar results. For gel source 
data, see Supplementary Fig. 1.e, Near complete lack of tethering of scaffold 


droplets to Vac8 mutant-anchored GUVs. Experiments were repeated 
independently 30 times with similar results. f, Additional examples of time- 
dependent change in the number and size of scaffold droplets on Vac8-GUVs, 
related to Fig. 5g. The number and average area of droplets +s.d.(n=droplet 
numbers) are shown. Experiments were repeated independently five times 
with similar results, which are shown here and in Fig. 5g. g, FRAP experiments 
of Atg1-SNAP in Atg1-complex droplets attached to Vac8-anchored 
multilamellar vesicles. Multilamellar vesicles were used instead of GUVs to 
reduce the movement of droplets on vesicles. The bottom graph indicates the 
ratio of fluorescence intensity at each time point in comparison tothe initial 
intensity. Data from seven independent experiments are shown. 
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Extended Data Fig. 6 | Proposed model of the PAS and its regulation by tethered to the vacuolar membrane through Atg13-Vac8 interaction. The early 
phosphorylation. Under growing conditions, hyperphosphorylation of Atg13 PAS activates Atg] kinase by accelerating autophosphorylation and at the same 
by TORC1 inhibits Atg1-complex formation and phase separation. Upon time recruits downstream Atg factors, thereby transforming into the mature 


starvation, TORC1activity is inhibited and Atg13 is dephosphorylated by PP2C PAS from which isolation membrane is generated. Continuous 
phosphatases, which leads to Atg1-complex formation. The Atg1 complex then phosphorylation and dephosphorylation of Atg13 at the PAS would contribute 
undergoes phase separation to forma liquid droplet (early PAS), whichis to maintaining the liquid property of the PAS. 
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Statistical parameters 


When statistical analyses are reported, confirm that the following items are present in the relevant location (e.g. figure legend, table legend, main 
text, or Methods section). 


n/a | Confirmed 


The exact sample size (n) for each experimental group/condition, given as a discrete number and unit of measurement 


An indication of whether measurements were taken from distinct samples or whether the same sample was measured repeatedly 


The statistical test(s) used AND whether they are one- or two-sided 
Only common tests should be described solely by name; describe more complex techniques in the Methods section. 


A description of all covariates tested 


A description of any assumptions or corrections, such as tests of normality and adjustment for multiple comparisons 


A full description of the statistics including central tendency (e.g. means) or other basic estimates (e.g. regression coefficient) AND 
variation (e.g. standard deviation) or associated estimates of uncertainty (e.g. confidence intervals) 


For null hypothesis testing, the test statistic (e.g. F, t, r) with confidence intervals, effect sizes, degrees of freedom and P value noted 
Give P values as exact values whenever suitable. 


For Bayesian analysis, information on the choice of priors and Markov chain Monte Carlo settings 


For hierarchical and complex designs, identification of the appropriate level for tests and full reporting of outcomes 


Estimates of effect sizes (e.g. Cohen's d, Pearson's r), indicating how they were calculated 


Clearly defined error bars 
State explicitly what error bars represent (e.g. SD, SE, Cl) 


Our web collection on statistics for biologists may be useful. 


Software and code 


Policy information about availability of computer code 


Data collection Fluorescence and DIC images were obtained by FV31S-SW ver. 2.3.1.163. 
AFM images were obtained by FalconTS_1.0.0. 
Blot images were obtained by Image Studio Ver. 4.0.21. 
SDS-PAGE images were obtained by Image Lab Ver. 4.0 build 16. 


Data analysis Image processing: FIJI ver. 1.52e, FV31S-SW ver. 2.1.1.98 
FCS analysis: Python ver. 3.6 
AFM image processing: Falcon Viewer, IGOR Pro ver. 6.2.2.2 
IDR prediction: DISOPRED (http://bioinf.cs.ucl.ac.uk/psipred/) 
FRAP analysis: Excel Office 365 
Coalescence analysis: Origin 2019 (9.60) 


For manuscripts utilizing custom algorithms or software that are central to the research but not yet described in published literature, software must be made available to editors/reviewers 
upon request. We strongly encourage code deposition in a community repository (e.g. GitHub). See the Nature Research guidelines for submitting code & software for further information. 


Data 


Policy information about availability of data 
All manuscripts must include a data availability statement. This statement should provide the following information, where applicable: 


- Accession codes, unique identifiers, or web links for publicly available datasets 
- A list of figures that have associated raw data 
- A description of any restrictions on data availability 


Data that support the findings of this study are available from the corresponding author upon reasonable request. 


Field-specific reporting 


Please select the best fit for your research. If you are not sure, read the appropriate sections before making your selection. 
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Life sciences Behavioural & social sciences Ecological, evolutionary & environmental sciences 


For a reference copy of the document with all sections, see nature.com/authors/policies/ReportingSummary-flat.pdf 


Life sciences study design 


All studies must disclose on these points even when the disclosure is negative. 


Sample size Sample size for yeast experiments was determined by referring to commonly used size. 

Data exclusions No exclusion. 

Replication Quantitative FRAP experiments were repeated three times with consistent data. FCS analyses were performed for dozens of cells in order to 
obtain statistically reliable data. Observation of coalescence of droplets was repeated three times both in vivo and in vitro with consistent 
data. Observation of Ostwald ripening was repeated three times with consistent data. Phosphorylation and dephosphorylation experiments in 
vitro were repeated three times with consistent data. HS-AFM observation of scaffold droplets were repeated three times with consistent 
data. GUV experiments were repeated at least five times with consistent data. Observation of PAS localization in yeast cells was repeated 
three times with consistent data. 


Randomization — Randomization is not relevant for this study because our work does not involve clinical trials or population studies. 


Blinding Blinding is not relevant for this study because no group allocation was performed. 


Reporting for specific materials, systems and methods 


aterials & experimental systems ethods 
n/a | Involved in the study n/a | Involved in the study 

Unique biological materials ChIP-seq 
Antibodies Flow cytometry 

|__| Eukaryotic cell lines [ ] MRI-based neuroimaging 

|__| Palaeontology 
Animals and other organisms 
Human research participants 


Antibodies 


Antibodies used Anti-FLAG M2 (F3165; Sigma; Lot#SLBN8915V)(dilution 1:1000), Anti-HA TANA2 (M180-3S; MBL; Lot#003)(dilution 1:10000), 
Anti-Mouse IgG (Fab specific)—Peroxidase antibody, polyclonal (A9917; Sigma; Lot#O81M4762)(dilution 1:20000), Anti-Rabbit IgG 
(whole molecule)—Peroxidase antibody, polyclonal (A6154; Sigma; Lot# SLBK2462V)(dilution 1:20000), Anti-Atg1-T226-P 
(produced in this study)(dilution 1:2000), Anti-Atg13-S428/9-P (Nat Struct Mol Biol 21, 513, 2014)(dilution 1:2000) 


Validation Validation for anti-FLAG M2 was performed by detecting a single band of protein on a western blot from an E. coli crude 
cell lysate: https://www.sigmaaldrich.com/content/dam/sigma-aldrich/docs/Sigma/Datasheet/f3165dat.pdf; Validation for anti- 
HA was performed by detecting a band of N-terminal HA fusion proteins in cell extracts: https://www.sigmaaldrich.com/content/ 
dam/sigma-aldrich/docs/Sigma/Datasheet/2/h3663dat.pdf; anti-mouse IgG: https://www.sigmaaldrich.com/content/dam/ 


sigma-aldrich/docs/Sigma/Datasheet/6/a9917dat.pdf; anti-rabbit IgG: https://www.sigmaaldrich.com/content/dam/sigma- 
aldrich/docs/Sigma/Datasheet/6/a0545dat.pdf; validation data for Anti-Atg13-S428/9-P are provided in Nat Struct Mol Biol 21, 
513, 2014; validation data for anti-Atg1-pThr226 are provided in extended data Fig. 2b 
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Proteins of the bromodomain and extra-terminal (BET) domain family are epigenetic 
readers that bind acetylated histones through their bromodomains to regulate gene 
transcription. Dual-bromodomain BET inhibitors (DbBi) that bind with similar 
affinities to the first (BD1) and second (BD2) bromodomains of BRD2, BRD3, BRD4 and 
BRDt have displayed modest clinical activity in monotherapy cancer trials. Areduced 
number of thrombocytes in the blood (thrombocytopenia) as well as symptoms of 
gastrointestinal toxicity are dose-limiting adverse events for some types of DbBi’>. 
Given that similar haematological and gastrointestinal defects were observed after 
genetic silencing of Brd4 in mice’, the platelet and gastrointestinal toxicities may 
represent on-target activities associated with BET inhibition. The two individual 
bromodomains in BET family proteins may have distinct functions’ ’ and different 
cellular phenotypes after pharmacological inhibition of one or both bromodomains 
have been reported’, suggesting that selectively targeting one of the 
bromodomains may result in a different efficacy and tolerability profile compared 
with DbBi. Available compounds that are selective to individual domains lack 
sufficient potency and the pharmacokinetics properties that are required for in vivo 
efficacy and tolerability assessment” °. Here we carried out a medicinal chemistry 
campaign that led to the discovery of ABBV-744, a highly potent and selective inhibitor 
of the BD2 domain of BET family proteins with drug-like properties. In contrast to the 
broad range of cell growth inhibition induced by DbBi, the antiproliferative activity of 
ABBV-744 was largely, but not exclusively, restricted to cell lines of acute myeloid 
leukaemia and prostate cancer that expressed the full-length androgen receptor (AR). 
ABBV-744 retained robust activity in prostate cancer xenografts, and showed fewer 
platelet and gastrointestinal toxicities than the DbBi ABBV-075™. Analyses of RNA 
expression and chromatin immunoprecipitation followed by sequencing revealed 
that ABBV-744 displaced BRD4 from AR-containing super-enhancers and inhibited 
AR-dependent transcription, with less impact on global transcription compared with 
ABBV-O75. These results underscore the potential value of selectively targeting the 
BD2 domain of BET family proteins for cancer therapy. 


Analysis of historical time-resolved fluorescence resonanceenergy compared withthe DbBi ABBV-075 (1.3 nM), activity against the BD1 of 
transfer (TR-FRET) data from approximately 2,500 compounds from BRD4 was reduced compared with ABBV-075 (22 nM for 1, 2.8 nM for 
our DbBi program identified ethyl amide 1, with a modest 17x selectivity  ABBV-075). Replacement of the 2,4-difluorophenyl moiety of lwitha 
for the BD2 of BRD4 compared with the BD1 of BRD4. Althoughactivity 2,6-dimethylphenyl ether further impaired BD1 activity, resulting in the 
against the BD2 of BRD4 for compound 1 (1.2nM) was notimproved 110 BD2-selective pyrrolopyridone 2 (124 nM and 1.1nM for BD1 and 
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Fig. 1| ABBV-744 is a potent and highly selective inhibitor of the BD2 domain 
of BET family proteins. a, Chemical structure of indicated compounds. 
b, Co-crystal structure of ABBV-744 (pink) in complex with BD2 of BRD2. 


BD2 of BRD4, respectively). Continued optimization of BD2 selectivity, 
metabolic stability and physical properties afforded atertiary alcohol 
on the central phenyl ring in place of the ethyl sulfonamide of ABBV- 
075, substantially affecting the activity against BD1 of BRD4 (520 nM 
for ABBV-744). Addition of a fluorine atom to the phenyl ether resulted 
in improved pharmacokinetic properties, leading to the discovery of 
ABBV-744 (Fig. 1a). 

ABBV-744 potently inhibited the BD2 domain of BET family proteins 
with more than 290x selectivity relative to the BD1 domains of BRD2, 
BRD3 and BRD4, and more than 95x selectivity compared with BD1 
of BRDt using TR-FRET, displayed K, values of 3,300 nM and 2.1 nMin 
surface plasmon resonance experiments and half-maximum inhibi- 
tory concentrations (IC;,) of 20,700 nM and 27.5 nM using NanoBRET 
assays for BD1 and BD2 of BRD4, respectively (Extended Data Fig. la, 
b). ABBV-744 also lacked significant activity against 75 kinases and 
22 bromodomain-containing proteins that represent diverse branches 
ofthe kinome and bromodome (Extended Data Fig. Ic and Supplemen- 
tary Tables 1, 2). ABBV-744 is primarily metabolized by CYP3A4 and 
shows oral bioavailability, enabling in vivo efficacy and tolerability 
studies (Extended Data Fig. 1d, e). 

The crystal structures of ABBV-744 complexed with both the BD2 and 
BD1 of BRD2 established the binding mode of ABBV-744 that underlies 
its BD2 selectivity (Fig. Ib—d and Extended Data Table 1). ABBV-744 
maintains all of the important interactions found for canonical DbBi’*”*, 
including binding of the pyrrolopyridone with the conserved Asn156 
residue, placement of the N-methyl moiety in the amphipathic water 
pocket, and positioning of an aryl ringin the WPF shelf in both BD2 and 
BD1 (Fig. 1b, c). The ethyl amide moiety of ABBV-744 exploits the Asp 
(BD1) and His433 (BD2) divergence conserved across all bbomodomain 
BET family members by burying the amide in a channel formed by the 
His433, Tyr386 and Pro430 residues of BD2 (Fig. 1b), a binding interac- 
tion that is not available in BD1 (Fig. 1c). The 2,6-dimethylphenyl ether 
moiety of ABBV-744 targets the subtle size distinction of the Ile162 
(BD1) and Val435 (BD2) sequence differences. Thus, incorporation ofa 
dimethylphenyl ether moiety forces an aryl methyl group to be buried 
in the rigid base of the WPF shelf. The smaller BD2 Val435 residue can 
accommodate this added methyl group interaction without disrup- 
tion of binding, and therefore binding potency is maintained. For the 
BDI protein, however, interaction of this aryl methyl group with the 
larger Ile162 residue forces the inhibitor to shift slightly away from 
the Ile moiety, causing a subtle change in the placement of both the 


OH ABBV-744 


_ 
lle162 


Asp160 


lle162 


c,Co-crystal structure of ABBV-744 (blue) in complex with BD1 of BRD2. 
d, Overlay of the co-crystal structure of ABBV-744 in complex with BD2 of BRD2 
(pink) and with BD1 of BRD2 (blue), displayed on the BRD2 BD1 protein (green). 


aryl group and the hydroxy group of the tertiary alcohol, leading toa 
less-optimal binding interaction (overlay in Fig. 1d) and a decrease in 
the potency of ABBV-744 with BD1. 

Wetested ABBV-744 in 59 cancer cell lines that are sensitive to DbBi” ~ 
and found that ABBV-744 retained robust antiproliferative activity 
(IC) <100 nM) mostly—but not exclusively—in acute myeloid leukaemia 
cells and a subset of prostate cancer cells that expressed the full-length 
AR, but not those expressing AR-V7 or that were AR-negative (Fig. 2a, 
Extended Data Table 2 and Supplementary Fig. 1). Similar to ABBV- 
075 and the AR antagonist enzalutamide, ABBV-744 induced cell cycle 
arrest in G1 followed by senescence in LNCaP cells (Fig. 2b). Narrow 
antiproliferative activity was also observed for a structurally distinct 
compound, A-083, across 240 cancer cell lines (Extended Data Fig. 2a—c 
and Supplementary Table 3). BD2 inhibitor compounds 74, 75 and RVX- 
208"” displayed a similar albeit weaker antiproliferative trend, prob- 
ably owing to their moderate selectivity and weaker binding affinity 
(Extended Data Fig. 2d). Relative to ABBV-075, ABBV-744 demonstrated 
limited potency in viability assays of megakaryocyte colony forming 
units (Mk-CFU) in mice and IEC-6 cells, which are potential surrogate 
assays for platelet production and proliferation of normal intestinal 
epithelium, respectively’ (Extended Data Fig. 2e). 

In ABBV-744-sensitive LNCaP cells, ABBV-744 elicited far fewer gene 
expression changes than ABBV-075 at doses at which BD2 of BRD4 was 
similarly inhibited or the reported doses of the DbBiJQ1 and iBET”°* 
(Fig. 2c and Extended Data Fig. 3a). For example, at a BD2-selective 
concentration (48 nM), ABBV-744 downregulated ACPP (also known 
as ACP3) and MYC but did not affect the ABBV-075-responsive genes 
HEXIM1, SPDEF and ZGI16B (Fig. 2d and Extended Data Fig. 3b). The 
BD2-dependent genes KLK2 and MYC were also partially inhibited by 
a potent and selective BD1 inhibitor described in the patent literature 
(BD1i)”, suggesting that these BD2-dependent genes were in part also 
dependent on BD1, and combined blockade of both domains mim- 
icked the activity of ABBV-075 (Extended Data Fig. 3c). When used ata 
high concentration (6 uM), ABBV-744 probably engaged both BD1and 
BD2, thus recapitulating the activities of ABBV-075 against all genes 
(Fig. 2d). Notably, this small set of 241 ABBV-744-regulated genes 
is highly enriched in dihydrotestosterone (DHT)-responsive genes 
(Fig. 2e and Supplementary Table 4). Gene set enrichment analysis 
also revealed common regulation of AR, MYC and E2F1 hallmarks by 
ABBV-744, enzalutamide and ABBV-075, similar to reports forJQ1 and 
iBET?°~ (Extended Data Fig. 3d). Although both ABBV-744 and ABBV-075 
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Fig. 2| ABBV-744 exhibits potent antiproliferative activity against AR- 
positive prostate cancer cells and inhibits AR-dependent transcription. a, 
The antiproliferative IC;) values across cancer cell lines after treatment with 
ABBV-075 or ABBV-744 for 5 days. b, ABBV-744 induced cell cycle arrest (left, 
72h60nM ABBV-075 or 90 nM ABBV-744; concentrations that elicited similar 
degrees of inhibition of BD2 of BRD4) and senescence (right, 12 days). ENZ, 
enzalutamide. Data are mean +s.d. (n=3 biologically independent samples) 
and are representative of n=3 independent experiments. Representative 
images of B-galactosidase staining of cells at 10Ox magnification are shownin 
the top right.c, Number of significantly regulated genes (fold change in 
expression > 2-fold, P< 0.01, n=2, statistical analysis by DESeq2 algorithm) and 
scatter plot of log,-transformed fold change in expression after 24h treatment 
compared with DHT stimulation alone in phenol red-free, charcoal stripped 
serum (vehicle, 5nM DHT, 5nM DHT and 60 nM ABBV-075, or 5nM DHT and 
90 nMABBV-744). Genes significantly regulated by both ABBV-075 and 
ABBV-744 or by individual compounds were labelled as ABBV-075 and 


prominently downregulated the DHT signature, ABBV-075 induced 
a broader distribution of expression alterations than ABBV-744 and 
affected hallmarks that were not affected by ABBV-744 (Fig. 2f and 
Extended Data Fig. 3d). Collectively, these results suggest that ABBV- 
744 significantly inhibited AR-dependent transcription in LNCaP cells 
while having a lower impact on global transcription than ABBV-075. 
DbBi has been shown to downregulate AR protein expression in 
some but not all experimental settings, probably owing to subtle dif- 
ferences in cell lines and exact experimental conditions. In our hands, 
neither ABBV-075 nor ABBV-744 reduced AR protein levels in LNCaP 
cells (Extended Data Fig. 3b). Given the lack of a direct effect on the 
AR protein, genome-wide AR and BRD4 occupancy was determined 
to understand the sensitivity of AR-dependent transcription to ABBV- 
744 in LNCaP cells. ABBV-075 but not ABBV-744 caused AR peak loss 
similar toJQ1 treatment”° (Extended Data Fig. 4a). Dependency profiles 
from the DepMap portal (https://depmap.org/portal/) indicated that 
prostate cancer cell lines are significantly more dependent on BRD4 
than BRD2 or BRD3, and higher BRD4 dependency is associated with 
higher sensitivity to ABBV-744 (Extended Data Fig. 4b), collectively 
suggesting that BRD4 may be the primary BET family driver of prostate 
cancer cell line viability and an important target of ABBV-744. ABBV-744 
displayed a globally weaker but otherwise similar pattern of BRD4 peak 
displacement relative to ABBV-075 and JQI, and preferentially down- 
regulated genes associated with super-enhancers similar to DbBi?°”>° 
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ABBV-744, ABBV-075 only, or ABBV-744 only. d, Expression of BD2-sensitive and 
-insensitive genes quantified using the branched DNA (bDNA) assay after 
treatment for 24 h with ABBV-075 or ABBV-744 in the presence of 5nM DHT. 
Data are mean +s.d. (n=3 biological replicates) and are representative of n=2 
independent experiments. e, Heat map of DHT-induced gene expression 
alterations (DHT signature, fold change in expression of >2, P< 0.01 for DHT 
versus vehicle, n=2) and the response of these DHT signature genes to 
treatment with enzalutamide, ABBV-075 or ABBV-744 in DHT-stimulated cells. 
f, Genes significantly (¢< 0.01) regulated by ABBV-075 or ABBV-744 in DHT- 
stimulated cells were classified as DHT-regulated genes (overlapping with the 
DHT signature) or non-DHT regulated genes (outside of the DHT signature). 
The distribution of log,-transformed fold changes in expressionis shownasa 
split violin plot. The long solid line represents the mean fold change. The small 
lines represent individual data points. The dotted line represents the overall 
average. Statistical significance between all and DHT was determined by 
two-tailed unpaired Student’s f-test, Pvalues were calculated by DESeq2. 


(Fig. 3a—c and Extended Data Fig. 4c). A subset of BRD4 peaks over- 
lapped with AR peaks, and notably 43% of the BRD4/AR co-occupied 
sites were in super-enhancers (Extended Data Fig. 4d). Interestingly, 
BRD4 was highly bound at AR-occupied super-enhancers relative to 
non-AR super-enhancers, and ABBV-744 and ABBV-075 both effectively 
displaced BRD4 from the AR-containing super-enhancers, suggesting 
anincreased dependence of BRD4-AR co-occupied super-enhancers on 
BD2 (Fig. 3d). Further integrating the BRD4-binding profile with gene 
regulation by AR, ingenuity pathway analysis and motif analysis revealed 
enrichment of DHT pathway and androgen-response elements within 
super-enhancers from which BRD4 was displaced by both ABBV-744 and 
ABBV-075 (Extended Data Fig. 5a, b). For example, ABBV-744 displaced 
BRD4 from BRD4-AR co-occupied super-enhancers that are closely 
associated with AR-dependent genes and inhibited KLK2 expression 
(Fig. 3e-g and Extended Data Fig. 3b). Similarly, ABBV-744 significantly 
affected BRD4 occupancy on super-enhancers associated with BD2- 
sensitive ACPP but not BD2-insensitive ZGI6B (Extended Data Fig. 5c). 

To understand the sensitivity of BRD4—AR co-occupied super- 
enhancers to ABBV-744, we tested BD2 dependency of the reported 
BRD4-AR interaction”’. A small but reproducibly detectable fraction of 
BRD4 was found in complex with AR. This DHT and acetylation-depend- 
ent interaction was disrupted by ABBV-744 and ABBV-075. By contrast, 
the reported interactions of BRD4 with CDK9, GATA2 or CDK9/cyclin T1 
with HEXIMI were not BD2 dependent’”’”’ (Extended Data Fig. 6a-c). 


a Transcription start sites b Enhancers 


H3K27Ac _ AR BRD4 H3K27Ac _AR BRD4 


ABBV-744 ABBV-075 


Vehicle Vehicle Vehicle ABBV-075 ABBV-744 Vehicle Vehicle 


2 P=1x10% 2 P=1x10° 


i 
| i 


Vehicle ABBV-075 ABBV-744 
T T 


f. 
7 


log, FC in expression 
i=} 

log, FC in expression 
Oo 


= 


| | jo -f0 ~ t0-i0  10-f0 ~ 10 SE Other SE Other 
kb kb 
i d BRD4 
i | AR, non-SE AR-bound SE Non-AR SE 
| 0.5 0.8 os 
—s =| ; 
0 0 03 0. 3 ee 
of re} a ABBV-075 
+ \ aj a ABBV-744 
ANS 
-10 10-10 10 -10 10 -10 10 -10 10 
= S ” is 10 10 -10 to 10 10 
e Region 1 Region 2 20 kb KJ hg 19 kb kb kb 
2664 out H3K27Ac BRD4 ChIP-qPCR Q _KLK2RT-gPCR 
a ye . y 20- Region 1 155 Region 2 105 
273) DHT AR mm 
1 il L A * 
eee ee re eee J. af : = 154 
694 7 A 3) R 5 
+DHT Bros| 2 10-4 3 
o 2 64 
are. a pe. ee oe Re 2 
697. DHT/ENZ Broa} © 8, 
z = 44 
i i MA ll AN NMI a ll ht SE te & n = & 
69) 4 DHT/ABBV-075 BRD4 54 A 2 
att nt di tM if ll A al i ll ate Ath ee Be em cotta naa nd 
89. DHT/ABBV-744 BRD4 o+ o4 o 
DHT + - + + + + t+ - t+ te t+ - + + + + 
stan scones ot Aili ll lt lll ale a ile litte eb A nt hl a G9 OD Lg > OG O L > 9 4 © > 
aie Ha Heli Re HH eS TSM eS OSS es ¢ Ss 
end KLK15 KLK3__KLK2 KLKP1 LK4 & & Rg & e ¢ 


Fig. 3 | ABBV-744 displaces BRD4 from AR-containing super-enhancers. 
LNCaP cells were incubated with 5 nM DHT and vehicle (DMSO), 60 nM 
ABBV-075 or 90 nM ABBV-744 for 6h, and cells were collected for ChIP-seq to 
determine H3K27Ac, AR and BRD4 chromatin association. a, b, Rank-ordered 
heat maps of H3K27Ac, ARand BRD4 peaks at transcription start sites or 
enhancers after the indicated treatment. Rows are ordered according to the 
vehicle-treated BRD4 maximum for each region and centred +10 kb of the 
BRD4 peak after treatment with vehicle. Colour scales depict reads per million 
(RPM) intensities. Bottom profile plots display log-transformed fold change in 
RPM/bp compared with control. BRD4 ChIP experiments were normalized to 
spike-in controls. c, Quantification of log,-transformed fold change in 
expression after ABBV-075 or ABBV-744 treatment for genes associated with 
super-enhancers (SE) or non-super-enhancers (other). For all box plots, centre 
line indicates the median; box limits are the first and third quartiles; whiskers 


Notably, AR acetylation at the K,,,LKK,,, motif that resembles BET 
bromodomain-binding sites in histones has been shown to be important 
for AR activity”. Considering that the N-terminal domain of AR has been 
shown to bind to BD1 directly”’, we speculated that acetylated AR may 
interact cooperatively with both BD1 and BD2 of BRD4 at AR-BRD4 
co-occupied super-enhancers to regulate a subset of AR-dependent 
genes that are therefore sensitive to BD2 inhibition (Extended Data 
Fig. 6d, e). In ABBV-744-resistant 22RVI cells, in which AR-dependent 
transcription is driven by AR-V7 (which lacks the K,3)LKK,33 motif?°), 
ABBV-744 failed to inhibit the AR gene signature, induced limited BRD4 
displacement from super-enhancers, and produced weak effects on 
proliferation and senescence, collectively supporting the putative 
interaction of acetylated AR with BD2 to induce sensitivity to ABBV-744 
(Extended Data Fig. 7a-f). More mechanistic studies will be required 
to confirm this hypothesis. 

The drug-like properties of ABBV-744 enabled the investigation of 
its antitumour efficacy and tolerability. In a mouse xenograft model 
using LNCaP cells, treatment with 4.7 mg kg! ABBV-744 (1/16 of the 
maximum tolerated dose (MTD)) caused a delay in tumour growth 
that was equivalent to ABBV-075 treatment at the MTD dose of lmg kg? 
(Fig. 4a). Comparing efficacious exposure levels of ABBV-744 in LNCaP 


range from the first quartile minus 1.5x the interquartile range to the third 
quartile plus 1.5x the interquartile range. Unpaired two-tailed Student’s t-test 
was used to determine significance for super-enhancers versus other; n=2. 

d, BRD4 profile plots at AR-bound regions that are not located in super- 
enhancers (AR, non-SE), AR-bound super-enhancers (AR, SE), or super- 
enhancers without AR binding (non-AR SE). e, Gene track of H3K27Ac, AR, and 
BRD4 ChIP-seq signals for the indicated treatment conditions at a super- 
enhancer that is associated with several AR-dependent genes. f, LNCaP cells 
that underwent the indicated treatments for 24 h were collected for ChIP-qPCR 
to determine the binding of BRD4 to the indicated regions in the gene track. 

g, KLK2 expression in LNCaP cells that underwent the indicated treatments for 
24 hwas determined by qPCR. f, g, Data are meants.d. (n=3 biologically 
independent samples) and are representative of n>2 independent 
experiments. 


tumour-bearing mice (4.7 mg kg“; area under the curve, 1.1 pg h mI) 
and MTD (75 mg kg; area under the curve, 13.1 1g h mI) demonstrated 
that ABBV-744 was able to produce significant antitumour activity at 
1/12 of the highest tolerable exposure of ABBV-744 (Extended Data 
Fig. 8a). The activity exhibited by ABBV-744 at 1/16 of the MTD of ABBV- 
744 was superior to the activities achieved using JQ1 and iBET at their 
respective MTDs or, in the case of RVX-208, at the highest feasible dose 
in this model (Extended Data Fig. 8b, c). Similarly, ABBV-744 at 1/16 MTD 
also displayed equivalent or better antitumour activity compared with 
ABBV-075 at MTD in the enzalutamide-resistant MDA-PCa-2b xenograft 
model (Fig. 4b). As acontrol, lowering the dose of ABBV-075 to 1/2 of the 
MTD resulted ina significant reduction in antitumour activity to 42% 
tumour growth inhibition in the LNCaP xenograft model. Even in the 
xenograft model using OPM2 cells, one of the most sensitive models to 
DbBi, ABBV-075 at 1/4 of the MTD of ABBV-075 (0.25 mg kg”) had only 
marginal antitumour efficacy (Extended Data Fig. 8d, e). 

In toxicity studies in rats, ABBV-075 at 3 mg kg! (3x the efficacious 
exposure inthe LNCaP mouse xenograft model), caused a59% reduction 
in platelets, a decrease in Alcian blue staining of the mucosa and the 
loss of goblet cells. By contrast, ABBV-744 at 30 mg kg? (25x the effica- 
cious exposure) triggered a reduction in platelets of only 20%, and at 
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Fig. 4 | ABBV-744 maintains DbBi-like activity in AR positive prostate cancer 
xenografts while displaying an improved tolerability profile. a, b, Mice 
bearing LNCaP (a) or MDA-PCa-2b tumours (b) were treated daily with 
enzalutamide, ABBV-075 or ABBV-744 at the indicated amounts using oral 
gavage throughout the indicated treatment period. Dataare mean+s.e.m. 
(n=9 mice per group ina) and7 mice per group inb). Mice treated with 

4.7 mg kg ABBV-744 or Img kg ABBV-075 were euthanized on day 28 to 
conduct ancillary studies. c, Sprague-Dawley rats (n=3 animals per group) 
were treated daily with vehicle, 3 mg kg? ABBV-075 or 60 mg kg? ABBV-744 for 
14 days. Histopathology assessment was carried out using large-intestinal 
sections after necropsy. Alcian blue staining was used to characterize goblet 
cells. Representative images of haematoxylin and eosin staining (top) and 
alcian blue staining (bottom) are shown. Efficacious exposure levels of 
ABBV-075 (1mg kg”) and ABBV-744 (4.7 mg kg”) in mice and exposure levels 
associated with the indicated doses of each compound in rats were determined 
in separate animals used for pharmacokinetic studies (n=3 animals). 


60 mg kg7 (47x the efficacious exposure) did not cause loss of goblet 
cells or other gross intestinal defects (Fig. 4c and Extended Data Fig. 8a). 
Similarly, 2.5 mg kg‘ ABBV-075 caused germ cell degeneration in the 
testes, whereas no microscopic changes in the testes were observed 
with 25 mg kg? ABBV-744. These efficacy and tolerability results col- 
lectively suggest that selectively targeting BD2 can induce antitumour 
activity in some cancer settings while mitigating key tolerability issues 
of DbBi. These findings support the advancement of ABBV-744 for clini- 
cal evaluation (ClinicalTrials.gov identifier NCTO3360006) and call 
for further investigation of BD2-dependent transcription programs 
to reveal additional therapeutic opportunities. 
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Extended Data Fig. 1| Characterization of ABBV-744. a, TR-FRET, surface 
plasmon resonance (SPR) and NanoBRET potency and selectivity of ABBV-744. 
b, Surface plasmon resonance binding of ABBV-075 and ABBV-744 to BD1and 
BD2 domains of BRD4. ABBV-075 binding curves (coloured) with fits to the 1:1 
binding model (black). ABBV-744 binds to BD1 with very fast on and off kinetics, 
therefore a steady-state fit to equilibrium responses was used to determine 


Biacore affinities. Dissociation of ABBV-744 from BD2is very slow and therefore 
binding was profiled using the single-cycle kinetics method. All experiments 
were repeated once with similar results. c, Binding affinities of ABBV-744 to 
selected bromodomains for which ABBV-744 exhibited more than 50% 
inhibition at 11M using BromoScan profiling. d, Pharmacokinetic parameters 
in mice. e, ABBV-744 stability after incubation with various CYP enzymes. 
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Extended Data Fig. 2| Antiproliferative activity of structurally diverse BD2 ABBV-075, ABBV-744 and BD2 and DbBis as described in the literature. e, 

and DbBis. a, Chemical structure of A-083. b, Activity of A-083 across multiple Antiproliferative activities of ABBV-075 and ABBV-744 against IEC-6 and LNCaP 
assays. c, Anti-proliferation activity of A-O83 across the OncoPanel of cells and the activities of both compounds ina Mk-CFU assay—an assay that 
Europhin, which consist of 240 cancer celllines across abroad spectrum of measures the generation of megakaryocytes from mouse haematopoietic stem 
cancer indications. d, Characterization and antiproliferative activities of cells—carried out by Stemcell Technology. 
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Extended Data Fig. 3 | ABBV-744 mimics enzalutamide and ABBV-075 to 
block AR-dependent transcription. a, Comparison of differentially regulated 
genes from this study with those reported in the literature using JQ1 and iBET. 
b, Reduction in MYC and KLK2 protein levels detected by western blot after 
treatment for 24 h with ABBV-075 (60 nM) or ABBV-744 (90 nM); no effect onAR 
was found. ABBV-075 but not ABBV-744 increases HEXIMI1 protein levels. 
Representative of n=3 independent experiments with similar results. For gel 
source data, see Supplementary Fig. 2.c, Biochemical, biophysical and cellular 
characteristics of the BD1 inhibitor (BD1i) described in the indicated GSK 
patent application. Bottom, Expression of KLK2and MYCinLNCaP cells after 
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6 htreatment with ABBV-075 (60 nM), ABBV-744 (90 nM), BD1i (200 nM) or 
ABBV-744 (90 nM) and BD1i (200 nM) was determined by qPCR. Dataare 

mean +s.d. (n=3 biologically independent samples) and are representative of 
n=2 independent experiments. d, Gene set enrichment analysis of RNA-seq 
data (n=2) fromLNCaP cells treated with ABBV-075, ABBV-744 or 
enzalutamide. Statistical significance was determined using a false-discovery 
rate (FDR) (Benjamini-Hochberg correction) and negative enrichment scores 
(NES) with g<0.05 are listed in the table. Venn diagram shows the overlap of 
enriched hallmarks with each treatment. AR, MYC and E2F gene set enrichment 
analyses are shownas examples. 
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Extended Data Fig. 4 | BD2-dependent BRD4 chromatin profile association 
with AR. a, AR peaks measured by AR ChIP-seq after treatment for 24 hwith 
DHT and DMSO, ABBV-075 or ABBV-744. As a reference, literature-reported 
changes in AR peaks afterJQ-1 treatment were also included. b, BRD4 but not 
BRD2 or BRD3 had strong dependency scores across all prostate cancer cell 
lines (left) and was correlated with ABBV-744 sensitivity (right). Dependency 
scores were obtained from the DepMap portal. Scores less than -0.5indicate 
the dependence of acancer cellline ona given gene. Dots represent the 


dependency score for an individual cell line. Dataare mean +s.d. across the 
group. Significance was calculated using unpaired, one-sided Student’s f-tests. 
ns, not significant. c, BRD4 and AR-binding profile at AR-regulated KLK genes 
for which ABBV-075 (60 nM) and ABBV-744 (90 nM) in LNCaP cells orJQ-1 

(500 nM) in VCAP” showed similar displacement of BRD4. Loss of AR was more 
notable after treatment with ABBV-075 andJQ-1 than after treatment with 
ABBV-744. d, Venn diagram of BRD4-AR peak overlap in LNCaP cells. In total, 
43% of AR-BRD4 common regions were located in super-enhancers. 
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Extended Data Fig. 5 | BD2-dependent BRD4 binding motifs and upstream 
regulators. a, HOMER motifs enriched in super-enhancers in which ABBV-744 
and ABBV-075 (common) displaced BRD4 or super-enhancers in which only 
ABBV-075 displaced BRD4 (exclusive), n=1. Statistics were derived using FDR 
(Benjamini-Hochberg correction) and q values are shown. b, Upstream 
regulators for differentially expressed genes (n= 2) associated with ABBV-744 


and ABBV-075 BRD4-displaced super-enhancers compared with ABBV-075- 
exclusive super-enhancers (n=1), as analysed by ingenuity pathway analysis. 
AR, E2Fland MYC all associated with common BRD4-displaced super- 
enhancers. c, Gene track examples of differential displacement pattern for 
ABBV-744 and ABBV-075 commonly sensitive (ACPP) or ABBV-075 exclusive 
(ZGI6B). 
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Extended Data Fig. 6 | BD2-dependent BRD4-AR interaction. a, LNCaP cells 
were treated for 16 h with DHT in the presence of vehicle, ABBV-744 (90 nM) or 
ABBV-075 (60 nM) with or without trichostatin A (TSA) (0.5 ug mI). AR 
immunoprecipitation (IP) using nuclear extracts pulled down BRD4 in 
trichostatin-A- and DHT-treated samples. ABBV-744 and ABBV-075 blocked 
BRD4 co-immunoprecipitation with AR. Fold change values from densitometry 
analysis are listed below the BRD4 blot, in which a1.9-fold increase in the 
AR:BRD4 immunocomplex was measured in the trichostatin-A- and vehicle- 
treated lane compared with 0.87 or 0.88 after treatment with ABBV-744 or 
ABBV-075, respectively. Western blot of 2% immunoprecipitation input 
revealed no change in nuclear protein levels after inhibitor treatment. b, LNCaP 
cells were treated for 16 h with DHT in the presence of vehicle, ABBV-744 


{ ABBV-744 


Be Cd Re Re 
—_— 


(90 nM) or ABBV-075 (60 nM). CDK9 or BRD4 immunoprecipitation using 
nuclear extracts pulled down BRD4 or GATA2, whichis not blocked by 
treatment with ABBV-744. c, LNCaP cells were treated for 16 h with DHT inthe 
presence of vehicle, ABBV-744 (90 nM) or ABBV-075 (60 nM). CDK9 or cyclin T1 
immunoprecipitation using nuclear extracts pulled down HEXIMIL, whichis not 
blocked or enhanced by treatment with ABBV-744. d, Alignment of aKXXK 
motifin H4, ARand the lack of this motif in AR-V7. e, Cooperative interaction of 
BD1and BD2 of BRD4 with acetylated AR at BRD4—AR co-occupied super- 
enhancers may underlie sensitivity to ABBV-744. a—c, Results are 
representative of n>2 independent experiments. For a—c gel source data, see 
Supplementary Fig. 2. 
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Extended Data Fig. 7 |22RV1 cells are resistant to ABBV-744. a, ABBV-075 but 
not ABBV-744 induces a robust dose-dependent increase of senescent 
(B-galactosidase-positive) 22RV1 cells after 7 days of treatment. Data are 

mean +s.d. (n=3 biological replicates) and are representative of n=2 
independent experiments. b, Scatter plot of gene expression changes (n= 2) 
caused by ABBV-075 (60 nM) or ABBV-744 (90 nM) treatment for 24 hin DHT- 
stimulated 22RV1 cells. Statistical analysis of fold change (FC) >2.0,P<0.01 was 
conducted using the DESeq2 method. c, Split violin representation of DHT- 
regulated compared with all differentially expressed genes in 22RV1 from RNA- 
seq as shown inb. The long solid line represents the mean fold change. The 
small lines represent individual data points. The dotted line represents the 


overall average. Statistical significance between all versus DHT was 
determined by two-tailed unpaired Student’s t-test and P< 0.01 by DESeq2. 
ABBV-075 affects both DHT anda broad distribution of genes, whereas 
ABBV-744 has a more limited effect on both DHT-stimulated genes and overall. 
d, ABBV-075 but not ABBV-744 negatively regulated the androgen response in 
22RV1cells as shown by gene set enrichment analysis. NES > 2.0,q< 0.05 
calculated using FDR (Benjamini-Hochberg correction). e, H3K27Ac and BRD4 
ChIP-seq heat maps at transcription start sites and enhancers in 22RV1 cells. f, 
ABBV-744 less effectively displaces BRD4 from super-enhancers inthe resistant 
22RV1 cell line compared with sensitive LNCaP cells. 
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Extended Data Fig. 8 | In vivo efficacy and tolerability of BD2 selective implanted in the mouse flank. JQ-1and iBET-762 were administered at their 
inhibitors and DbBis. a, Sprague-Dawley rats (n=3 animals per group) were respective MTD. RVX-208 was administered at its maximal achievable dose. 
treated daily with vehicle, ABBV-075 (3 mg kg“) or ABBV-744 (30 mg kg”) for Data are mean +s.e.m. of tumour size for each treatment group (n= 6). WL, 
14 days, and platelet counts were determined using the standard method. maximum weight loss relative to initial value; FD, found dead. c, Efficacy 
Efficacious exposure levels of ABBV-075 (Img kg”) and ABBV-744 (4.7 mg kg”) comparison of BET inhibitors inthe LNCaP model. d, e, Mice bearing LNCaP 
in mice and exposure levels associated with the indicated doses of each tumours (d;n=9 per group) or OPM2 tumours (e; n=10 per group) were treated 
compoundin rats were determined in separate pharmacokinetic studies using with vehicle or ABBV-075 using oral gavage at the indicated amounts for 21 days 
different animals (n=3 animals per group). b, Antitumour activity of well- (PO, QDX21). Data are mean +s.e.m. of tumour size for each treatment group. 


known BET inhibitors in the xenograft model in which LNCaP cells were 
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Extended Data Table 1| Data collection and refinement statistics 


Data collection and refinement statistics of ABBV-744 in complex with BRD2/BD1 and BRD2/BD2 


BRD2-D2 BRD2-D1 
ABBV-744 ABBV-744 
Data collection 
Space group P4, P2,2)2) 


Cell dimensions 


a, b,c (A) 107.8, 107.8, 89.8 48.7, 56.1, 107.0 

os BSC) 90, 90, 90 90, 90, 90 
Resolution (A) 2.43 (2.43 - 2.48) * 1.97 (1.97 - 2.01) * 
Ryim 0.046 (0.41) 0.059 (0.95) 
I/ol 14 (2.2) 16 (2.5) 
Completeness (%) 100 (100) 97.6 (97) 
Redundancy 6.6 (6.4) 6.4 (6.9) 
Refinement 
Resolution (A) 2.44 1.98 
No. reflections 38380 20710 
Rwork / Riree (%) 21.1/23.8 19.8 / 23.3 
No. atoms 

Protein 5436 1875 

Ligand 216 72 

Water 298 132 
B-factors 

Protein 63 50 

Ligand 58 44 

Water 52. 59 
R.m.s. deviations 

Bond lengths (A) 0.009 0.010 

Bond angles (°) 0.93 0.88 


Crystal structure coordinates and X-ray diffraction data of ABBV-744 in complex with BD1 of BRD2 and BD2 of BRD2 have been deposited in the Protein Data Bank with accession numbers 6E6J 
and 6ONY. 
*Values in parentheses are for the highest-resolution shell. 


Extended Data Table 2 | Antiproliferative activities of ABBV-744 across cancer cell lines 


a Anti-proliferative IC;9 (nM) 
Cell Line LNCaP MDA-PCa-2b MDV-R 22RV1 VCaP PC3 DU-145 
AR Status T878A T878A, L702H | T878A, F876L AR-V7 AR-V7 AR Negative | AR Negative 
ABBV-744 At 9 18 467 354 >1000 >1000 
Enzalutamide 550 >30,000 >30,000 >30,000 >30,000 >30,000 >30,000 
b 
Tumor Types Cell Line ABBV-075 ICs (nM) | ABBV-744 ICs (nM) Tumor Types Cell Line ABBV-075 ICs (nM) ABBV-744 ICs (nM) 
AML SIG-M5S 2.8 2.1 OV ES-2 6.0 830.0 
AML OCI-AML2 2:7 3.0 Ov Cov413B 17.0 880.0 
AML MV-4-11 2.6 3.1 TNBC BT549 27.9 916.0 
AML EOL1 2.8 8.4 OV OV56 21.0 916.0 
AML Kasumi 3.0 13.0 NSCLC NCI-H2347 45.0 968.0 
AML OCI-AML3 2.9 13.0 Head & Neck FADU 22.0 >1000 
AML Nomo1 3.8 18.0 Neuroblastoma Kelly 45.2 >1000 
AML HNT-34 4.4 58.0 Neuroblastoma SKNF1 75.3 >1000 
PC LNCaP 3.8 10.8 NSCLC NCI-H1792 41.0 >1000 
PC MDV-R 10.0 18.0 NSCLC NCI-H727 87.0 >1000 
PC MDA-PCa-2b 13.5 38.3 NSCLC NCI-H2170 100.0 >1000 
TNBC HCC2157 2.1 46.7 NSCLC HCC-1395 108.0 >1000 
TNBC DU4475 9.0 110.3 NSCLC NCI-H1563 117.0 >1000 
TNBC HCC1187 Jel, 120.8 NSCLC NCI-H827 988.0 >1000 
TNBC HS578T 5.4 139.7 NSCLC NCI-H661 1000.0 >1000 
OV OVCAR3 8.0 167.0 NSCLC NCI-H2935 1000.0 >1000 
TNBC MDA-MB-453 10.5 245.0 Ov OVCAR8 12.0 >1000 
OV A2780 16.0 290.0 Ov CcOvV434 13.0 >1000 
OV PA-1 14.0 320.0 OV CaoV-3 32.0 >1000 
OV 0C314 10.0 348.0 PC PC3 125.5 >1000 
PC VCaP 6.6 354.0 TNBC HCC1806 14.6 >1000 
TNBC MDA-MB-468 31.0 454.1 TNBC HCC38 25.9 >1000 
PC 22RV1 16.5 467.0 TNBC SUM149PT 61.0 >1000 
TNBC HCC1599 8.9 479.7 TNBC CAL120 77.3 >1000 
TNBC MDA-MB-468 26.5 518.2 TNBC HCC70 90.6 >1000 
OV SKOV3 17.0 600.0 TNBC MDA-MB-231 138.2 >1000 
OV OVCARS 14.0 696.0 TNBC MDA-MB-436 160.1 >1000 
PC DU-145 130.2 706.9 TNBC MDA-MB-157 292.7 >1000 
NSCLC NCI-H1703 48.0 806.0 TNBC HCC1937 508.2 >1000 
TNBC HCC1143 579.6 >1000 


a, Segregation of ABBV-744 sensitivity with AR status in prostate cancer cell lines. b, ABBV-075 and ABBV-744 IC, values in a 5-day proliferation assay. 
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Statistical parameters 


When statistical analyses are reported, confirm that the following items are present in the relevant location (e.g. figure legend, table legend, main 
text, or Methods section). 


n/a | Confirmed 


The exact sample size (n) for each experimental group/condition, given as a discrete number and unit of measurement 


An indication of whether measurements were taken from distinct samples or whether the same sample was measured repeatedly 


The statistical test(s) used AND whether they are one- or two-sided 
Only common tests should be described solely by name; describe more complex techniques in the Methods section. 


A description of all covariates tested 


A description of any assumptions or corrections, such as tests of normality and adjustment for multiple comparisons 


A full description of the statistics including central tendency (e.g. means) or other basic estimates (e.g. regression coefficient) AND 
variation (e.g. standard deviation) or associated estimates of uncertainty (e.g. confidence intervals) 


For null hypothesis testing, the test statistic (e.g. F, t, r) with confidence intervals, effect sizes, degrees of freedom and P value noted 
Give P values as exact values whenever suitable. 


For Bayesian analysis, information on the choice of priors and Markov chain Monte Carlo settings 


For hierarchical and complex designs, identification of the appropriate level for tests and full reporting of outcomes 


Estimates of effect sizes (e.g. Cohen's d, Pearson's r), indicating how they were calculated 


Clearly defined error bars 
State explicitly what error bars represent (e.g. SD, SE, Cl) 


Our web collection on statistics for biologists may be useful. 


Software and code 


Policy information about availability of computer code 


Data collection Illumina Genome Analyzer for sequence data, Commercial softwares (Studylog Systems, Inc., South San Francisco, CA) was used to collect 
in vivo tumor model data. Prestima software was used for in life and hematology data collection. Biacore T200 instrument and 
manufacturer provided software was used to collect SPR binding data. Envision plate reader with manufacturer supplied software was 
used to collect TR-FRET and NanoBRET data. Enspire plate reader with manufacturer supplied software was used to collect cell 
proliferation data. 


Data analysis Commercial software was used to analyze all data in this study as described in each section of the methods. These include Microsoft 
Excel, Prism GraphPad 5, Ingenuity Pathway Analysis, ArrayStudio, Biacore T200 software from manufacturer. 


For manuscripts utilizing custom algorithms or software that are central to the research but not yet described in published literature, software must be made available to editors/reviewers 
upon request. We strongly encourage code deposition in a community repository (e.g. GitHub). See the Nature Research guidelines for submitting code & software for further information. 
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Data 


Policy information about availability of data 
All manuscripts must include a data availability statement. This statement should provide the following information, where applicable: 


- Accession codes, unique identifiers, or web links for publicly available datasets 
- A list of figures that have associated raw data 
- A description of any restrictions on data availability 


RNASeq and ChIPSeq dataset can be accessed in GEO (Accession GSE118152, GSE118247, GSE130269). Crystal coordinates and X-ray diffraction data was deposited 
in the protein databank with the accession code 6E6J and 6ONY. Other datasets generated and/or analyzed during the current study are available from the 
corresponding author on reasonable request. 
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For a reference copy of the document with all sections, see nature.com/authors/policies/ReportingSummary-flat.pdf 


Life sciences study design 


All studies must disclose on these points even when the disclosure is negative. 
Sample size For efficacy studies, a one sided t-test was used to determine the number of 
animals needed to obtain 80% power at alpha = 0.05. For rat tox studies, sample size of n=3 animals per group was based on internal 
experience of ability to identify test article related changes during drug candidate selection. 
Data exclusions No data were excluded from the analysis. 


Replication Experiments were repeated with same conditions and obtained similar results. The number of repeats were indicated in figure legends. 


Randomization _ For efficacy study, mice were randomized into treatment groups using Studylog software (Studylog Systems, Inc., South San Francisco, CA) 
based on tumor volume. For rat tox study, animal allocation to vehicle and treatment groups was at random based on body weight. 


Blinding Partial blinding for efficacy studies was used. A multiple technicians formulated and dosed compounds and randomized the groups. Additional 
investigators blinded to the test agents measured tumor volumes during the study. Toxicologic data analysis is generally performed in 
unblinded fashion which was the case for data described in this paper. 


Reporting for specific materials, systems and methods 


Materials & experimental systems Methods 
n/a | Involved in the study n/a | Involved in the study 
Unique biological materials ChIP-seq 
Antibodies Flow cytometry 
Eukaryotic cell lines MRI-based neuroimaging 


Palaeontology 


Animals and other organisms 


Human research participants 


Antibodies 


Antibodies used Information on all of the antibodies used in the study is presented in SI Table 


Validation H3K27Ac Ab noted on Active motif website to be modENCODE validated, NGS-QC certified, and validated for ChIP-Seq. BRD4 Ab 
is cited in at least 11 literature publications for ChIP and ChIP-Seq. AR Ab is cited in at least 28 literature publications including 
ChIP and ChIP-Seq applications. Antibody information is presented in SI Table. 
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Eukaryotic cell lines 


Policy information about cell lines 


Cell line source(s) 
Authentication 


Mycoplasma contamination 


Commonly misidentified lines 
(See ICLAC register) 


The source and authentication of all eukaryotic cells in the study is presented in SI Table. 
Cell lines were authenticated using GenePrint 10 STR Authentication Kit (Promega, Madison, WI) 


Cell lines were tested for mycoplasma using MycoAlert Detection Kit (Lonza, 
Walkersville, MD) and all lines tested negative. 


No commonly misidentified lines used in this study 


Animals and other organisms 


Policy information about studies involving animals; ARRIVE guidelines recommended for reporting animal research 


Laboratory animals NSG-male mice (Jackson Laboratory), Fox Chase SCID® (Charles River Labs) mice, and Sprague Dawley (Crl:CD(SD)) rat strain from 
commercial sources were used. Male rat 56-58 days of age at initiation of testing article administration were used. NSG and Fox 
Chase SCID® male mice 6-8 weeks of age at time of study initiation were used. 


Wild animals No wild animals used in the study 


Field-collected samples No field-collected samples used in the study 


ChIP-seq 


Data deposition 


Data access links 
May remain private before publication. 


Files in database submission 


Genome browser session 
(e.g. UCSC) 


Methodology 
Replicates 


Sequencing depth 


Antibodies 
Peak calling parameters 
Data quality 


Software 


Confirm that both raw and final processed data have been deposited in a public database such as GEO. 


Confirm that you have deposited or provided access to graph files (e.g. BED files) for the called peaks. 


Accession GSE118152, GSE118247 


Provide a list of all files available in the database submission. 


Provide a link to an anonymized genome browser session for "Initial submission" and "Revised version" documents only, to 
enable peer review. Write "no longer applicable" for "Final submission" documents. 


Each ChIP-Seq experiment was n=1. 

All experiments were single end, 75 nt reads. For individual experiments total/usable: BRD4 DHT 39,825,247/22,844,767; 
BRD4DHT ABBV-744 37,353,171/18,060657; BRD4 DHT ABBV-075 38,458,410/22,734,829; BRD4 DHT ENZ 

34,719, 339/21,325,188; AR 39,178,357/27,668,040; H3K27Ac 33,779,643/26,271,979. 

Active Motif H3K27Ac cat#39133 lot 8, Bethyl BRD4 cat#A301-985A lot 6, Santa Cruz AR cat#sc-13062 lotB2616. 

Peaks were called using MACS2.1.0 narrow, pvalue cutoff 1e-7. 

Peaks that were on the ENCODE blacklist of known false ChIP-Seq peaks were removed. 

\llumina Casava 1.8 software used for basecalling. Reads were aligned to hg19 using BWA algorithm, USeq platform for 
Intersecting Regions and Neighboring Gene identifications (http://useq.sourceforge.net/). Further analysis of aligned bam 


files was done using NGSPlot (https://github.com/shenlab-sinai/ngsplot) to visualize heatmaps and generate average profile 
plots. NGSPlot provided heatmap and average profile plot figures. 
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PIWI-interacting RNAs (piRNAs) of between approximately 24 and 31 nucleotides in 
length guide PIWI proteins to silence transposons in animal gonads, thereby ensuring 
fertility’. In the biogenesis of piRNAs, PIWI proteins are first loaded with 
5’-monophosphorylated RNA fragments called pre-pre-piRNAs, which then undergo 
endonucleolytic cleavage to produce pre-piRNAs’”. Subsequently, the 3’-ends of pre- 
piRNAs are trimmed by the exonuclease Trimmer (PNLDC1 in mouse)* ° and 2’-O- 
methylated by the methyltransferase Henl (HENMT1in mouse)’ ’, generating mature 
piRNAs. It is assumed that the endonuclease Zucchini (MitoPLD in mouse) is a major 
enzyme catalysing the cleavage of pre-pre-piRNAs into pre-piRNAs”” ’. However, 
direct evidence for this model is lacking, and how pre-piRNAs are generated remains 
unclear. Here, to analyse pre-piRNA production, we established a Trimmer-knockout 
silkworm cell line and derived a cell-free system that faithfully recapitulates Zucchini- 
mediated cleavage of PIWI-loaded pre-pre-piRNAs. We found that pre-piRNAs are 
generated by parallel Zucchini-dependent and -independent mechanisms. Cleavage 
by Zucchini occurs at previously unrecognized consensus motifs on pre-pre-piRNAs, 
requires the RNA helicase Armitage, and is accompanied by 2’-O-methylation of pre- 
piRNAs. By contrast, slicing of pre-pre-piRNAs with weak Zucchini motifs is achieved 
by downstream complementary piRNAs, producing pre-piRNAs without 2’-O- 
methylation. Regardless of the endonucleolytic mechanism, pre-piRNAs are matured 
by Trimmer and Henl. Our findings highlight multiplexed processing of piRNA 
precursors that supports robust and flexible piRNA biogenesis. 


piRNAsareaclass of small RNAs, approximately 24-31 nucleotides (nt) in 
size, produced from transposons and from discrete genomic loci called 
piRNA clusters”, and guide PIWI proteins to target transcripts. PIWI 
proteins possess an endonucleolytic activity, referred to as ‘slicer’, which 
directly cleaves target RNAs inthe cytoplasm“ ”. In addition, a subset of 
PIWI proteins mediates transcriptional silencing in the nucleus ”°. In 
germ cells, piRNA biogenesis is coupled with reciprocal slicing between 
complementary transcripts derived from transposons and piRNA clus- 
ters, a process called the ping-pong cycle*’ (Extended Data Fig. 1). 
To generate mature piRNAs, PIWI proteins are first loaded with long 
single-stranded RNA fragments bearing a 5’ monophosphate, called pre- 
pre-piRNAs!”. The pre-pre-piRNA is then endonucleolytically cleaved 
at a position 3’ downstream of the PIWI-bound region to generate two 
cleavage fragments’””””, In mice, silkworms and many other animals, 
the 5’-cleavage fragment, called a pre-piRNA, is shortened to the mature 
length by Trimmer (PNLDC1in mouse), a PARN-like 3’-to-5’ exonuclease 
localized on the mitochondrial surface®°, and 2’-O-methylated by the 
methyltransferase Hen] (HENMT1in mouse)’ ’. The3’ cleavage fragment 
is loaded into the next PIWI protein as anew pre-pre-piRNA. As aresult, 
aseries of ‘trailing’ pre-piRNAs are consecutively generated’??? and 
matured by Trimmer and Hen1l (Extended Data Fig. 1). 


The endonuclease Zucchini (MitoPLD or PLD6 in mouse)", whichis 
localized onthe mitochondrial outer membrane, is assumed to mediate 
cleavage of the PIWI-bound pre-pre-piRNAs””. Because trailing piRNAs 
often start witha 5’ uridine (U)**””, itis believed that cleavage activity 
of Zucchini has a preference for a site immediately before U in vivo. 
However, purified Zucchini protein shows nonspecific endoribonucle- 
ase activity in vitro!"", Owing to this discrepancy, the identity of the 
endonuclease for pre-pre-piRNAs remains unclear and is ambiguously 
and cautiously described in the literature’*”. Thus, anin vitro system 
that can faithfully recapitulate the endonucleolytic reaction mediated 
by Zucchini is needed to resolve these ambiguities and discrepancies. 


Two parallel pathways produce pre-piRNAs 


To accurately investigate how pre-piRNAs are generated from pre-pre- 
piRNAs, it is necessary to block further processing of pre-piRNAs by 
Trimmer. In previous studies, knockdown of Trimmer in the silkworm 
cell line BmN4 resulted in only a slight extension of piRNA lengths*”, 
suggestive of residual Trimmer activity. To overcome this, we used 
CRISPR-Cas9 to generate Trimmer-knockout (Tri-KO) BmN4 cells 


(Extended Data Fig. 2a—c). We identified Tri-KO lines deficient in both 
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Fig.1| Two types of small RNAs accumulate in Tri-KO cells. 

a, Immunoprecipitated (IP) Siwi or BmAgo3 from naive or Tri-KO BmN4 cells 
was analysed by western blotting (WB) (top) and bound RNAs were detected by 
5’ radiolabelling (bottom). IgG, immunoprecipitation with non-immunized 
rabbit IgG. b, Length distribution of small RNAs mapped to 3,236 piRNA lociin 
the total small RNA library from naive or Tri-KO BmN4 cells. See also Extended 
Data Fig. 2i,j.c, The most abundant small RNA length among the reads sharing 
the same 5’ end was defined as the peak length for each piRNA locus (for 
example, peak length = 34 nt for piRNA-1543, top). The 3,236 piRNA loci were 
aligned in the order of their peak lengths in the Tri-KO-NalO, library (middle). 
piRNA loci with peak length of <30 nt were defined as type-N and those with 
peak length of >31nt were defined as type-E (bottom). See also Extended Data 
Fig. 2k.d, Changes in the length distribution of NalO,-treated Tri-KO small 
RNAs bearing peak lengths of 28 or 29 nt (type-N), or 35 or 36 nt (type-E) caused 
by depletion of BmZuc. Mock indicates knockdown for Renilla luciferase. See 
also Extended Data Fig. 21. Z, denotes the zscore at positionn. RPM, reads per 
million. e, Mean occurrence of piRNA S’ ends relative to the peak position of 
each piRNA locus. Pie charts show the nucleotide composition immediately 
after the peak position of each piRNA locus. The numbers of analysed piRNA 
lociare shown in the parentheses. The per cent nucleotide composition inthe 
silkworm genome corresponding to positions 11-45 of piRNA lociis 
26:23:22:29 (T:G:C:A). 


Trimmer protein and the in vitro trimming activity (Extended Data 
Fig. 2d-f). Tri-KO cells lacked mature 27-28 nt piRNAs and accumulated 
longer RNAs of about 30-40 nt (Extended Data Fig. 2g, red line) that 
co-immunoprecipitated with Siwi or BmAgo3 (Fig. 1a). Overexpres- 
sion of wild-type (WT) but not catalytically inactive Trimmer E30A 
(EA) recovered mature-length piRNAs (Extended Data Fig. 2h). These 
results suggest that silkworm pre-piRNAs are about 30-40 ntinlength 
and are trimmed by Trimmer for maturation, irrespective of which 
PIWI protein they bind. 

Tocharacterize the pre-piRNAs in Tri-KO cells, we sequenced 20-50-nt 
small RNAs from Tri-KO cells with or without NalO, treatment, which 
enables specific detection of 2’-O-methylated species. Small RNAs 
mapping to well-defined 3,236 piRNA loci’ showed a sharp distribution 
at 27-28 nt in naive BmN4 cells, but had a broad length distribution 
around 30-40 ntin Tri-KO cells (Fig. 1b). The small RNAs in Tri-KO cells 
were largely protected from NalO, treatment (Fig. 1b, Extended Data 
Fig. 2i), suggesting that they are 2’-O-methylated, with longer species 
more efficiently methylated than shorter ones (Extended Data Fig. 2)). 
We determined the most frequent small RNA length (peak length) for 
each piRNA locus in the NalO,-treated library and plotted the peak 
lengths for all 3,236 piRNA loci without considering the small RNA 
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abundance from each locus (Fig. 1c, Extended Data Fig. 2k). In con- 
trast to mature piRNAs in naive cells (Extended Data Fig. 2k), the peak 
lengths of Tri-KO small RNAs showed a clear bimodal distribution: one 
peak at 27-28 nt anda broader peak around 35 nt (Fig. 1c, bottom). For 
simplicity, we refer to the piRNA loci with peak length up to 30 nt as 
type-N (non-extended) and those with peak length greater than 30 nt 
as type-E (extended). 

To test the requirement for Bombyx mori Zucchini (BmZuc) dur- 
ing processing of pre-pre-piRNAs into pre-piRNAs in silkworms, we 
knocked down BmZucin Tri-KO cells. Notably, depletion of BmZuc did 
not affect the length distribution of type-N small RNAs, but strongly 
decreased peaks of length greater than 30 ntin the type-E small RNAs 
(Fig. 1d, Extended Data Fig. 21 and Supplementary Note 1), suggest- 
ing that BmZuc is required to produce pre-piRNAs from type-E loci. 
Supporting this idea, the genomic nucleotide immediately following 
the 3’ end of type-E small RNAs in Tri-KO cells tended to be U (Fig. le), 
a proposed hallmark of Zucchini-mediated cleavage called the ‘+1U 
bias?* 23, Moreover, type-E small RNAs in Tri-KO cells were fre- 
quently accompanied by immediately downstream piRNAs onthe same 
genomic strand (Fig. le), a pattern typically observed in trailing piRNAs 
or pre-piRNAs?*", By contrast, these signatures were nearly absent 
in Tri-KO type-N small RNAs (Fig. le). Taken together, we conclude 
that BmZuc mediates the production of type-E pre-piRNAs, whereas 
type-N pre-piRNAs are generated via a BmZuc-independent pathway. 

We next investigated whether BmZuc generates pre-piRNAs for 
both Siwi and BmAgo3, the two PIWI proteins in silkworms. To this 
end, we first defined Siwi- and BmAgo3-dominant piRNA loci (Extended 
Data Fig. 3a, see Methods). We then plotted the peak length of Tri-KO 
small RNAs separately for Siwi- or BnAgo3-dominant piRNA loci, and 
observed similar bimodal distributions corresponding to type-N and 
type-E (Extended Data Fig. 3b). BmZuc depletion reduced the peak- 
length populations of type-E small RNAs for both Siwi- and BmAgo3- 
dominant piRNA loci (Extended Data Fig. 3c), suggesting that BmZuc 
mediates type-E pre-piRNA production regardless of which PIWI protein 
is bound to the pre-pre-piRNA. However, compared with Siwi-dominant 
type-E pre-piRNAs, BmAgo3-dominant type-E pre-piRNAs showed a 
weaker +1U bias (Extended Data Fig. 3d) and had a lower frequency 
of immediately downstream piRNAs (Extended Data Fig. 3e). Thus, 
even though BmZuc mediates pre-pre-piRNA cleavage for both Siwi 
and BmAgo3, the production of downstream trailing piRNAs is largely 
restricted to Siwi. 

We next investigated how pre-piRNAs in the type-N group are gener- 
ated. In flies, most Ago3-bound piRNAsare processed from pre-piRNAS 
generated by piRNA-guided slicing at a downstream position” (Supple- 
mentary Discussion). To determine whether the 3’ end of silkworm pre- 
piRNAs can be generated by downstream slicing, we analysed sense and 
antisense piRNAs mapped to the downstream region of type-N or type-E 
piRNA loci. The abundance of sense piRNAs inthe downstream region 
was similar for type-N and type-E piRNA loci (Fig. 2a, sense strand). By 
contrast, antisense piRNAs at approximately 41-52 nt from their 5’ ends 
were observed more frequently in the downstream region of type-N 
loci (Fig. 2a, antisense strand), for both Siwi-dominant and BmAgo3- 
dominant loci (Extended Data Fig. 3f). Antisense piRNAs in this region 
can, intheory, guide slicing of pre-pre-piRNAs and generate the 3’ end 
of 31-42 nt pre-piRNAs. Therefore, unlike in flies””’”?, downstream 
slicing of pre-pre-piRNAs in silkworms is probably determined by the 
context of the cleavage site and not by the identity of PIWI proteins. 
However, the peak lengths of Tri-KO small RNAs in the type-N group 
(less than 30 nt) were shorter than the expected pre-piRNA lengths 
based on the downstream slicing sites (31-42 nt) (Fig. 1c), implying 
that they are somehow fragmented into shorter species in Tri-KO cells. 

To further investigate pre-piRNA generation, we examined four 
piRNA loci representing a Siwi-dominant (piRNA-1528) or BmAgo3- 
dominant (piRNA-66) type-E locus, as well as a Siwi-dominant (piRNA- 
2986) or BmAgo3-dominant (piRNA-304) type-N locus (Extended Data 
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Fig.2| The 3’ ends of type-E, but not type-N, pre-piRNAsare efficiently 2’-O- 
methylated. a, The 5’ ends of piRNAs mapped to 20-100 nt downstream of 
piRNA loci were mapped on the antisense (left) or sense (right) genomic strand. 
Type-N piRNAs have more antisense piRNAsat ~41-52 nt from the 5’ ends than 
type-E piRNAs (two-sided Wilcoxon signed-rank test, n=12). See also Extended 
Data Fig. 3f. b,c, Northern blot analysis of the four representative piRNAs in 
naive or Tri-KO BmN4 cells depleted of the indicated protein by RNAi. Inb, total 
RNAs were treated with or without NalO, B-elimination. Asterisks, BmZuc- 
dependent fragments. Red bars, putative pre-piRNAs. See also Extended Data 
Fig. 3g. 


Fig. 3g). Type-N piRNA-2986 and 304, but not type-E piRNA-1528 and 66, 
have downstream ping-pong sites with readily detectable complemen- 
tary piRNAs, which can guide PIWI-catalysed slicing of pre-pre-piRNAs. 
We examined small RNAs deriving from these loci in Tri-KO cells by 
northern blotting, and confirmed the accumulation of correspond- 
ing pre-piRNAs (Fig. 2b, mock, red lines). BmZuc depletion decreased 
type-E, but not type-N, pre-piRNA levels (Fig. 2b), reinforcing the idea 
that type-E pre-piRNAs are generated via BmZuc-mediated cleavage. 
Notably, type-N pre-piRNAs showed many shorter heterogeneous 
RNA fragments (Fig. 2b). These data suggest that pre-piRNAs gener- 
ated by PIWI-catalysed slicing are intrinsically unstable and prone 
to non-specific degradation, at least in the absence of Trimmer. This 
could explain why the peak lengths of type-N small RNAs were 30 nt 
or shorter in Tri-KO cells (Fig. 1c). By contrast, type-E pre-piRNAs were 
more stable, especially pre-piRNA-66. 

Given our observation that the 2’-O-methylation level was generally 
higher for longer small RNAs than shorter ones in Tri-KO cells (Fig. 1b 
and Extended Data Fig. 2i,j), we predicted that type-E pre-piRNAs are 
efficiently 2’-O-methylated. Indeed, type-E pre-piRNA-1528 and pre- 
piRNA-66 were refractory to NalO, treatment (Fig. 2b). By contrast, 
type-N pre-piRNA-2986 and pre-piRNA-304, as well as their degra- 
dation products, were mostly—if not completely—shortened by one 
nucleotide by a NalO, B-elimination reaction (Fig. 2b). Thus, type-E 
pre-piRNAs produced via BmZuc-mediated cleavage are more effi- 
ciently 2’-O-methylated than type-N pre-piRNAs generated by down- 
stream piRNA-guided slicing. Consistently, type-E pre-piRNAs 1528 
and 66 became prone to nonspecific degradation upon depletion of 
the 2’-O-methyltransferase BmHenl1 in Tri-KO cells, whereas type-N 
pre-piRNAs 2986 and 304 and their degradation products, which are 
intrinsically poorly 2’-O-methylated, were largely unaffected (Fig. 2c). 
Mature piRNAs were fully 2’-O-methylated in naive BmN4 cells, regard- 
less of how their pre-piRNAs are generated (Fig. 2b, Naive BmN4), sup- 
porting the model that 3’-end trimming by Trimmer is tightly coupled 
with 2’-O-methylation by BmHen1”. 

Type-N and type-E piRNA loci are heterogeneously distributed even 
within a single transposon (Extended Data Fig. 3h), suggesting that 
how pre-piRNAs are produced is determined at the level of individual 
piRNA loci. We also note that the separation between the type-N and 
type-E groupsis not absolute; there are many cases in which pre-piRNAS 
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Fig.3|BmZuc requires BmArmi, BmGPAT1and BmGasz for cleavage of Siwi- 
loaded pre-pre-piRNAs in vitro. a, Schematic of in vitro BmZuc cleavage assay. 
See also Extended Data Fig. 4a. b, RNA substrates used in c—e, handi.c-e, Siwi- 
loaded 5_49U RNA (top, €) or »3_g9U RNA (bottom; d, e) was incubated with 
1,000g pellet from naive or Tri-KO cells overexpressing BmZuc(WT) or the 
catalytic mutant BmZuc(HN) (c, d) and/or BmArmi(WT) or the ATP-binding 
mutant BmArmi(KA) (d), or 1,000g pellet from Tri-KO cells depleted of the 
indicated protein by RNAi (e). The expression of Flag-tagged BmZuc and GFP- 
tagged BmArmi was confirmed by western blot (c,d, bottom). See also 
Extended Data Fig. 4c. f, Western blot analysis of the 1,000g pellet used ine. 
See also Extended Data Fig. 4e. g, Northern blot analysis of representative 
type-E or type-N pre-piRNAs in Tri-KO cells depleted of the indicated proteins 
by RNAi. Asterisks, BmZuc-dependent fragments. Red bars, putative pre- 
piRNAs.h, i, Siwi-loaded 5, ,)U RNA was incubated with 1,000g pellet from Tri- 
KO cells overexpressing BmZuc and BmArmi. After incubation, RNAs were 
extracted and treated with NalO, followed by B-elimination (h), or naive 1,000g 
pellet was added and further incubated (i).j, RNA substrates used ink. k, Siwi- 
loaded ,,.;5C RNAs bearing U, A or Gat position 37 were incubated with 1,000g 
pellet from Tri-KO cells overexpressing BmZuc and BmArmior BmZuc(HN). 

I, Quantification of the 36-nt cleavage fragments produced by 1,000g pellet 
from Tri-KO cells overexpressing BmZuc and BmArmiink. Dataaremeants.d. 
from four technically independent experiments. Bonferroni-corrected 
Pvalues from two-sided paired f-tests are as follows: *P= 0.0163; **P= 0.00106; 
**P= 0.000485. 


are produced by both mechanisms, as represented by pre-piRNA-304 
(Fig. 2b, asterisks) and pre-piRNA-1249 (Extended Data Fig. 3i). 


In vitro analysis of BmZuc activity 

We next sought to recapitulate pre-piRNA production in vitro. We 
previously established a cell-free system to monitor the 3’-end-trim- 
ming reaction by Trimmer using mitochondria-containing 1,000g 
pellets”* (Extended Data Fig. 4a, left). Both Trimmer and BmZuc are 
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mitochondrial outer-membrane proteins, so we anticipated that 
the same strategy could be applied to detect BmZuc activity in the 
1,000g pellet from Tri-KO cell homogenate (Fig. 3a, Extended Data 
Fig. 4a, right). Since Zucchini is thought to cleave 5’ of U, we first used 
a series of single-stranded (ss)RNAs bearing a poly(U) sequence as 
model substrates (Fig. 3b). Incubation of Siwi-loaded 40-nt poly(U)- 
containing RNA with Tri-KO 1,000g pellet produced an RNA fragment 
of about 36 nt, which is much longer than the mature trimming prod- 
uct observed with naive 1,000g pellet. (Fig. 3c). The 36-nt fragment 
was also observed when we used an 80-nt poly(U)-containing RNA 
(Extended Data Fig. 4b, ATP+). For both 40- and 80-nt RNAs, overex- 
pression of catalytically inactive BmZuc H141N (HN) decreased the 
~36-nt signal (Fig. 3c, Extended Data Fig. 4b), suggesting that active 
BmZuc is required to generate this fragment. Thus, BmZuc catalyses 
the production of the 36-nt RNA fragment in vitro, regardless of the 
initial length of Siwi-loaded poly(U)-containing RNAS. This is consist- 
ent with the idea that the PIWI proteins themselves position Zucchini 
on pre-pre-piRNAs?. 

Depletion of ATP from the in vitro reaction abolished the 36-nt cleav- 
age product (Extended Data Fig. 4b, ATP—), suggesting that BmZuc- 
mediated cleavage requires ATP. Purified Zucchini cleaves ssRNAS 
in an ATP-independent manner"), whereas Armitage (MOV1OL1 in 
mice)—a factor required for the biogenesis of trailing piRNAs—is an ATP- 
dependent RNA helicase”*”®. To examine whether BmArmiis required 
for BmZuc-mediated cleavage in vitro, we overexpressed wild-type 
BmArmior its ATP binding mutant, K692A (KA), with or without BmZuc 
in Tri-KO cells. Overexpression of wild-type BmArmi alone strongly pro- 
moted in vitro cleavage of the Siwi-loaded 80-nt RNA, suggesting that 
BmArmiisa rate-limiting factor for BmZuc-mediated cleavage (Fig. 3d). 
By contrast, overexpression of BmArmi(KA) inhibited the cleavage 
reaction, indicating the importance of the ATPase activity (Fig. 3d). 
Knockdown of BmZuc or BmArmiin Tri-KO cells abolished the produc- 
tion of the 36-nt RNA fragment in vitro and biogenesis of endogenous 
type-E, but not type-N, pre-piRNAs, confirming their requirement for 
the cleavage reaction (Fig. 3e-g, Extended Data Fig. 4c). Inadditionto 
BmArmi, BmZuc-mediated cleavage required two other proteins local- 
ized on the mitochondrial surface, BmMGPAT1 and BmGasz (Fig. 3e-g, 
Extended Data Fig. 4c-e, Supplementary Discussion), homologues of 
which have been genetically implicated in Zucchini-mediated piRNA 
production in flies and mice”?”*°. 

We found that the BmZuc in vitro cleavage product of about 36 nt 
was at least partly resistant to NalO, treatment, suggesting that it is 
protected by 2’-O-methylation (Fig. 3h). Thus, our in vitro system 
properly recapitulates BmZuc-mediated cleavage of pre-pre-piRNAs 
and the production of 2’-O-methylated type-E pre-piRNAs. Finally, we 
examined whether Trimmer can trim the cleavage product generated 
by BmZucto produce mature piRNAs, recapitulating processing in vivo. 
The 36-nt BmZuc cleavage product was efficiently converted into 27-28- 
nt mature piRNAs by naive 1,000g pellet, which contains endogenous 
Trimmer (Fig. 3i). This result validates the stepwise 3’-end maturation 
mechanism of type-E piRNAs: BmZuc cleaves pre-pre-piRNAS to gener- 
ate pre-piRNAs, which are trimmed to the mature length by Trimmer. 

Previous genetic and deep-sequencing analyses have suggested 
that Zucchini preferentially cleaves immediately 5’ to Uin vivo?*°, 
We also observed that the 3’ ends of type-E pre-piRNASs in Tri-KO cells 
have a modest +1U bias, especially for type-E pre-piRNAs bound to 
Siwi (Fig. le, Extended Data Fig. 3d). However, previous biochemical 
analyses using purified Zucchini proteins and naked RNAs have failed to 
detect this U preference”°"”", We applied our new in vitro system using 
mitochondria-containing pellets and Siwi-loaded pre-pre-piRNAs to 
revisit this inconsistency. We performed the in vitro BmZuc cleavage 
assay with a 50-nt RNA bearing a poly(C) sequence as well as variants 
that substituted the C at position 37 with U, A or G (Fig. 3j). Compared 
with the 37A, 37G and 37C RNAs, the 37U RNA substrate yielded mod- 
erately but significantly increased levels of the 36-nt cleavage product, 
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ina manner dependent on the catalytic activity of BmZuc (Fig. 3k, 1). 
Thus, our system recapitulates the U preference of BmZuc, consistent 
with our bioinformatics analysis of type-E pre-piRNAs (Fig. le, Extended 
Data Fig. 3d). 


BmZuc motif dictates piRNA biogenesis 


The moderate U preference of BmZuc was apparent for poly(C)-based 
sequences in vitro (Fig. 3j-l). However, BmZuc does not always cleave 
immediately 5’ to U in our in vitro system (Extended Data Fig. 4f). More- 
over, natural type-E pre-piRNAs showed only a modest +1U bias (Fig. le, 
Extended Data Fig. 3d). Thus, the proposed U preference does not fully 
explain how the cleavage site is chosen by BmZuc. To investigate the 
substrate specificity of BmZucina comprehensive and unbiased man- 
ner, we performed a screen in Tri-KO cells. In brief, we constructed a 
plasmid-based library that expresses 35 nt of random sequence flanked 
by target sites for an abundant BmAgo3-dominant piRNA® (Fig. 4a, 
see Methods and Supplementary Note 2). The transcripts are expected 
to be sliced by the BmAgo3-dominant piRNA, loaded into Siwi via the 
ping-pong pathway as new pre-pre-piRNAs, and cleaved by BmZuc 
within the downstream randomized region (Fig. 4a), producing various 
type-E pre-piRNAs. We first sequenced the library-derived small RNAs 
and examined their peak length distribution (Extended Data Fig. 5a). 
We observed library-derived small RNAs around 35 nt, recapitulating 
the size range of endogenous type-E pre-piRNAs. These approximately 
35-nt RNAs were enhanced by overexpression of BmZuc and BmArmi, 
and strongly inhibited by BmZuc(HN), indicating that they are gener- 
ated by BmZuc-mediated cleavage. We then aligned them at the 3’ ends 
of their peak length (that is, putative BmZuc cleavage sites, defined 
as position O) and analysed the nucleotide frequencies at each posi- 
tion. Focusing on the six nucleotides with the highest frequencies, we 
identified a sequence motif (-10A, —2A, —-1U, OU, +1U, +4C) (Fig. 4b), 
which was also consistently observed in the BmZuc + BmArmi over- 
expression condition. 

Tofurtherinvestigatethe‘BmZuc motif’ inSiwi-bound pre-pre-piRNAS, 
we analysed two representative sequences from the library (84497 
and 111750) that contain all six consensus nucleotides. We generated a 
series of mutants that alter the consensus sequence and performed the 
BmZuc cleavage assay (Fig. 4c, d, Extended Data Fig. 5b, c). Wild-type 
sequences showed site-specific cleavage at 34 or 35 nt, as expected 
from the in-cell screen. By contrast, ‘All mut’ sequences, in which all the 
six consensus nucleotides were mutated, lacked site-specific cleavage 
(Fig. 4d, Extended Data Fig. 5c), providing further supporting evidence 
that this motif determines the BmZuc cleavage site. Unexpectedly, 
mutating only the +1U, the proposed hallmark of Zucchini-mediated 
cleavage, did not inhibit BmZuc-mediated site-specific cleavage of 
these sequences (Fig. 4d, Extended Data Fig. 5c, +1U mut.). However, 
mutating -1U and OU together strongly inhibited the cleavage at the 
correct position, whereas mutating -10A, -2A and +4C together had 
a minor effect. In sum, our findings reveal a previously unrecognized 
consensus motif that is important for BmZuc to precisely determine 
the cleavage site. 

BmZuc generates the 3’ ends for both Siwi- and BmAgo3-loaded pre- 
piRNAs (Extended Data Fig. 3b, c). To investigate whether BmZuc has a 
different nucleotide preference for pre-pre-piRNAs bound to BmAgo3, 
we constructed a reciprocal plasmid library whose transcripts were 
loaded into BmAgo3 as pre-pre-piRNAs with a randomized sequence. 
We then performed co-immunoprecipitation with Siwi from the Tri-KO 
cells transfected with the original library and co-immunoprecipitation 
with BmAgo3 from the cells transfected with the reciprocal library, 
and analysed the bound small RNAs with peak lengths of 31-44 nt after 
NalO, treatment (Extended Data Fig. 5d, e). As expected, the small 
RNAs immunoprecipitating with Siwi showed very similar nucleotide 
preferences around their 3’ ends as the BmZuc motif identified by the 
non-immunoprecipitation experiment using the same plasmid library 
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Fig. 4 | Identification of BmZuc consensus motifs. a, Schematic of the screen 
toidentify BmZuc motifs. b, Top 6 nucleotides with highest frequencies 
(coloured) in small RNAs derived from the randomized library, aligned tothe 
BmZuc cleavage site. c, RNA substrates used ind. The top 6 nucleotides inthe 
BmZuc motif are shown in colour and their mutations (mut.) are shown in black 
(exc., except). d, BmZuc-mediated cleavage assay using Siwi-loaded 
84497-derived RNAs. Each gel image was adjusted to equalize the loading 
signal. See also Extended Data Fig. 5b, c.e, Nucleotide frequency around the 
BmZuc cleavage sites for Siwi-bound (left) or BnAgo3-bound (right) small 
RNAs with peak lengths of 31-44 nt derived from the randomized sequence 
libraries. The frequency was normalized to the nucleotide composition inthe 
library of the randomized region. The 6 nucleotides inthe BmZuc motifare 
highlighted. See also Extended Data Fig. 5d-f. f, Similarity scores with the 
weighted BmZuc motif (BmZuc score) for Siwi or BmAgo3 were calculated for 


(Fig. 4b, e). The small RNAs derived from the reciprocal plasmid library 
that immunoprecipitated with BmAgo3 also exhibited preferences for 
-10A and +4C. However, the nucleotide bias around the BmZuc cleavage 
site (-1U, OU and +1U) was markedly lower for small RNAs immunopre- 
cipitating with BmAgo3 than for those immunoprecipitating with Siwi 
(Fig. 4e). Thus, BmZuc has similar but distinct nucleotide preferences 
for Siwi-loaded and BmAgo3-loaded pre-pre-piRNAs. 

To determine whether this difference is also observed in the endog- 
enous piRNA loci, we analysed the nucleotide frequency around the 
3’ end of Siwi- or BmAgo3-dominant type-E pre-piRNAs in the NalO,- 
treated Tri-KO small RNAs (Extended Data Fig. 3b). The -10A, -2A and 
+4C sequences were fairly well conserved inthe type-E pre-piRNAs for 
both Siwi and BmAgo3 (Extended Data Fig. 5f). Notably, as observed 
inthe randomized libraries, Siwi-dominant and BmAgo3-dominant 
Type-E pre-piRNAs showed differences around the BmZuc cleavage site 
(-1U, OU and +1U), with the +1U bias nearly lost in BnAgo3-dominant 
type-E pre-piRNAs (Extended Data Fig. 3d, 5f). These data suggest that 
the identity of PIWI proteins can influence the substrate specificity 
of BmZuc (Fig. 4e, Extended Data Fig. 5f) as well as the production of 


~31-42 nt Me: 2’-O-methylation 


extracted genomic sequences from Siwi- or BmAgo3-dominant piRNA lociin 
sliding windows and plotted in the same order as in Fig. 1c. Red lines indicate 
the actual 3’ ends of Tri-KO type-E small RNAs in Fig. 1c. See also Extended Data 
Fig. 5g,i.g, BmZuc scores for Siwi were calculated for 84497-derived RNAsin 
sliding windows and plotted as ind. See also Extended Data Fig. 5g. h, Amodel 
for pre-piRNA production and 3’-end maturation in silkworms. Silkworm pre- 
piRNAs are generated via two parallel endonucleolytic mechanisms, BmZuc- 
mediated cleavage and PIWI-catalysed slicing. BmZuc shows similar but 
distinct nucleotide preferences between Siwi- and BmAgo3-loaded pre-pre- 
piRNAs. The ‘+1U’ bias, the previously recognized hallmark of Zucchini- 
mediated cleavage, is modest for Siwi but absent for BmAgo3. The 3’ ends of 
pre-piRNAs are matured by Trimmer and Hen1, regardless of how they are 
produced. 


downstream trailing piRNAs (Extended Data Fig. 3e). In other words, 
although +1U is tightly linked with trailing piRNA production, it is not 
a prerequisite for BmZuc-mediated cleavage per se. The strong +1U 
bias observed for fly Piwi-bound piRNAs”” and mouse MILI- and MIWI- 
bound pachytene pre-piRNAs”** might reflect the efficient production 
of trailing piRNAs for these PIWI proteins. 

Having uncovered BmZuc motifs that help to define the 3’-end of 
endogenous type-E pre-piRNAs, we determined whether we could 
use these motifs to predict where BmZuc cleaves in pre-pre-piRNAs. 
We defined the similarity score to the weighted BmZuc motif (Fig. 4e, 
Extended Data Fig. 5g, see Methods) and plotted this against the posi- 
tions of the predicted BmZuc cleavage sites. The position that gave 
the maximum similarity score agreed well with the actual peak length 
(that is, the actual BmZuc cleavage site) of endogenous type-E pre- 
piRNAs (Fig. 4f). Moreover, the in vitro cleavage patterns of Siwi-loaded 
84497 or 111750 RNAs matched well with their calculated similarity 
scores (Fig. 4g, Extended Data Fig. 5h), highlighting the critical role of 
the BmZuc motif in determining the cleavage site. In fact, the simple 
presence or absence of BmZuc motifs in pre-pre-piRNAs can dictate by 
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which endonucleolytic mechanism they are cleaved (Extended Data 
Fig. Siand Supplementary Discussion). Thus, the sequence context 
of pre-pre-piRNAs has a major role in determining how silkworm pre- 
piRNAs are produced. 


Discussion 


Here we show that silkworm pre-piRNAS are generated by two parallel 
endonucleolytic mechanisms: BmZuc-mediated cleavage and PIWI- 
catalysed slicing (Fig. 4h). This multiplexed system supports robust 
and flexible piRNA biogenesis in silkworms, which have only two PIWI 
proteins (Supplementary Discussion). Regardless of how pre-piRNAs 
are generated and to which PIWI protein they are bound, Trimmer is 
essential for the maturation of piRNAs; this appears to be true also in 
mice and many other species” (Supplementary Discussion). It has been 
reported that Zucchini preferentially cleaves immediately 5’ to U2* 8, 
highlighting the +1U bias as the one (and only) signature of Zucchini- 
mediated cleavage. Our data suggests that +1U alone is insufficient to 
determine the BmZuc cleavage sites and instead reveals previously 
unrecognized consensus motifs preferred by BmZuc (Fig. 4, Extended 
Data Fig. 6 and Supplementary Discussion). How these specific motifs 
are recognized warrants future investigation. Given that isolated Zuc- 
chini proteins do not show any apparent nucleotide specificity", we 
speculate that it is not Zucchini itself, but rather a ‘reaction platform’ 
on the mitochondrial surface formed by proteins such as Armitage 
and Gasz that have important roles in determining the cleavage site 
(Supplementary Discussion). 
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Methods 


Cell culture, plasmid transfection, and generation of stable or 
knockout cell line in BmN4 cells 

BmN4 cells (provided by T. Kusakabe, Kyushu University; not authen- 
ticated and not tested for mycoplasma contamination) were cultured 
at 27 °C in IPL-41 medium (AppliChem) supplemented with 10% fetal 
bovine serum. For plasmid transfection, 5—7.5 pg of plasmid DNAs 
were transfected into BmN4 cells (2.5 x 10° cells per 10 cm dish) with 
X-tremeGENE HP DNA Transfection Reagent (Sigma). For generation 
of stable cells expressing GFP-BmArmi, BmN4 cells were transfected 
with a GFP-tagged BmArmi expression vector and selected under 10 
pg/ml puromycin for 3 weeks. For generation of Trimmer knockout 
cell line, BmN4 cells were co-transfected with pIEx1-MychCas9NLS 
expression vector and pBS-BmU6-sgTrimmer expression vector. One 
week later, the cells were reseeded at alow density (-1-4 x 10* cells per 
15cm dish) and cultured in 50-75% conditioned medium. About 3 weeks 
later, colonies were picked up under a microscope. 


Plasmid construction 

pIEx1-Trimmer WT and the catalytic mutant (E30A) were described 
previously’. The primer sequences for plasmid construction are listed 
in Supplementary Table 1. 


plEx1-MychCas9NLS. A DNA fragment coding Myc-hCas9-NLS was am- 
plified from pRB14 (a gift from K. Forstemann)” and cloned into pIEx-1 
vector (Millipore/Novagen) by In-Fusion HD cloning kit (Takara Clontech). 


pBS-BmU6-sgTrimmer. To generate pBS-BmU6-BbsI-chiRNA vector, 
the fly U6 promoter in pBS-U6-BbsI-chiRNA expression vector (a gift 
from K. Forstemann)*” was replaced with the Bombyx mori U6 pro- 
moter® amplified from the BmN4 genome. Synthesized DNA oligos 
for Trimmer sgRNA were annealed and inserted into BbsI-digested 
pBS-BmU6-BbsI-chiRNA vector. 


plExZ-BmZuc WT, H141N. To generate pIExZ vector, theampicillin resistant 
geneandits promoter sequence in pIEx-1 vector (Millipore/Novagen) were 
replaced by ie2 promoter and Zeocin resistant gene amplified from pIZ/V5- 
His vector (Thermo Fisher/Invitrogen). To enhance the expression, BmZuc 
coding sequence was codon-optimized to Bombyx mori using EMBOSS 
Backtranseq (http://www.ebi.ac.uk/Tools/st/emboss_backtranseq/). The 
BmZuc coding sequence was synthesized by GeneArt Strings DNA Frag- 
ments service (Life Technologies) and cloned into pIExZ vector. The cata- 
lytic mutant BmZuc (H141N) was generated by site-directed mutagenesis. 


EGFP-BmArmi K692A. The ATP-binding mutant EGFP-BmArmi K692A 
was generated by site-directed mutagenesis into EGFP-BmArmi (a gift 
from T. Kusakabe and T. Tatsuke)*™. 


Antibodies and western blotting 

Rabbit anti-Siwi, anti-BmAgo3, anti-BmZuc, anti-BmArmi, anti-BmG- 
PAT1, anti-BmGasz antibodies were generated by immunizing N-ter- 
minally His-tagged recombinant Siwi (aa 2-100), BmAgo3 (aa 2-100), 
BmZuc (aa 28-206), BmArmi (aa 2-294), BmGPAT1 (aa 602-870), 
BmGasz (aa 2-269) respectively (Scrum). The sera were affinity-puri- 
fied by acolumn containing the immobilized recombinant protein. 
Anti-Trimmer and anti-BmPapi antibodies were described previously’. 
Anti-Flag (M2) (Sigma), anti-actin (Santa Cruz, sc-1616) and anti-GFP 
(B-2) (Santa Cruz) antibodies were purchased. Chemiluminescence 
was induced by Luminata Forte Western HRP Substrate (Millipore) 
and images were acquired by Amersham Imager 600 (GE Healthcare). 


In vitro processing assay 
In vitro ssRNA loading and trimming, NalO,-mediated oxidation, and 
B-elimination were performed essentially as described previously”. 


Each substrate ssRNA was 5’-radiolabelled with T4 polynucleotide 
kinase (Takara) and [y-”P]ATP (PerkinElemer). For BmZuc cleavage 
assay, Tri-KO cells were resuspended in hypotonic buffer (10 mM 
HEPES-KOH (pH7.4), 10 mM KCI, 1.5 mM MgCl,,1mM DTT, 1x Complete 
EDTA-free protease inhibitor (Roche)) and incubated on ice for 20 min. 
Subsequently, the cell suspension was vortexed for 30 s, centrifuged 
at 1,000g for 20 min at 4 °C, and the supernatant was removed. The 
pellet was resuspended in hypotonic buffer and used as the 1,000g 
pellet fraction. Typically, 7 pl of the resuspended 1,000g pellet fraction 
was added to immunopurified Flag—Siwi-ssRNA complex on beads 
together with 3 pl of 40 x reaction mix (containing ATP, ATP regen- 
eration system, and RNase inhibitor)® and incubated at 25 °C for 2.5h 
(Fig. 3c, e and Extended Data Fig. 4b, c) or 30 °C for 20 min (Fig. 3k), 
2h (Fig. 4d and Extended Data Fig. 5c), 2.5 h (Fig. 3h, i), or 3h (Fig. 3d and 
Extended Data Fig. 4f). For BmZuc cleavage assay in an ATP-depleted 
condition (Extended Data Fig. 4b), hypotonic buffer was added instead 
of 40 x reaction mix. For standard trimming assay, Flag-Siwi-ssRNA 
complex on beads was incubated with 1,000g pellet from naive BnN4 
cells together with 40x reaction mix at 25 °C for 20 min (Extended Data 
Fig. 2f) or 30 °C for 1.5 h (Fig. 3i). In all the in vitro cleavage/trimming 
assays, lysates with an equal protein concentration were used in each 
experimental set. Images were acquired by Typhoon FLA 7000 (GE 
Healthcare) and analysed using Multi Gauge 3.0 (Fujifilm). 


RNAi in BmN4 cells 

For dsRNA preparation, template DNAs were prepared by PCR using 
primers containing T7 promoter listed in Supplementary Table 1. 
dsRNAs were transcribed using T7 Scribe Standard RNA IVT Kit (Cell 
Script) and purified with MEGAclear Transcription Clean-Up Kit 
(Thermo Fisher/Invitrogen). For dsRNA transfection, 5 ug of dsRNAS 
were transfected into BmN4 cells (6 x 10° cells per 10 cm dish) with 
X-tremeGENE HP DNA Transfection Reagent (Sigma). dsRNAs were 
repeatedly transfected every 3 days for four times. 


Immunoprecipitation 

For Siwi and BmAgo3 immunoprecipitation, cells were resuspended 
in buffer A (25 mM Tris-HCl (pH 7.6), 150 mM NaCl, 1.5mM MgCl, 0.2% 
sodium deoxycholate, 0.1% lithium dodecyl sulfate, 0.4% NP-40, 0.5 mM 
DTT, 1x Complete EDTA-free protease inhibitor (Roche)) and incubated 
on ice for 20 min. The cell suspension was diluted with equal volume 
of buffer A without detergents and centrifuged at 17,000g for 30 min 
at 4 °C. The supernatant was incubated with normal rabbit IgG (Cell 
Signaling), anti-Siwi or anti-BmAgo3 antibody at 4 °C for 1h, and then 
Dynabeads Protein G (Thermo Fisher/Invitrogen) was added. After 
incubation at 4 °C for 1h, the beads were washed with buffer B (25 mM 
Tris-HCI (pH 7.6), 150 mM NaCl, 1.5 mM MgCl,, 0.1% sodium deoxycho- 
late, 0.05% lithium dodecyl sulfate, 0.2% NP-40, 0.5 mM DTT, 1x Com- 
plete EDTA-free protease inhibitor (Roche)).The immunoprecipitated 
proteins were eluted with SDS sample buffer, and bound RNAs were 
purified with mirVana miRNA Isolation Kit (Thermo Fisher/Invitrogen) 
for small RNA library preparation or TRI Reagent (Molecular Research 
Center) for 5’ radiolabelling. 


Genome extraction, RNA extraction, quantitative real-time PCR 
and northern blotting 

Genomic DNA of BmN4 cells was extracted using NucleoSpin Tissue 
(Macherey-Nagel). Total RNAs prepared by TRI Reagent (Molecular 
Research Center) were used for real-time PCR and northern blotting. 
One microgram of total RNAs was reverse transcribed by PrimeScript 
RT reagent kit with gDNA eraser (Takara), and (RT-PCR was performed 
using KAPA SYBR FAST qPCR Master Mix (Kapa Biosystems) and the 
Thermal Cycler Dice Real Time System (Takara). For northern blot- 
ting, 10-12 pg of total RNAs were resolved by 15% urea polyacrylamide 
gel electrophoresis (PAGE) and transferred to Hybond-N membrane 
(GE Healthcare). After chemical crosslinking”®, 5’ labelled antisense 
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DNA probes were hybridized with Perfect Hyb Plus (Sigma) at 42 °C 
overnight. The primer sequences for genomic PCR, real-time PCR and 
DNA probes for Northern blotting are listed in Supplementary Table 1. 


Immunofluorescence 

Stable cells expressing GFP-tagged BmArmi were treated with 100 nM 
Mitotracker Red CMXRos (Cell Signaling) at 27 °C for Lh. After fixing 
with 4% paraformaldehyde at room temperature for 10 min, the cells 
were permeabilized with 0.3% Triton X-100 for 5 min and incubated with 
PBS supplemented with 1% BSA (Sigma) and 0.1% Triton X-100 at room 
temperature for 1h. Then, the cells were incubated with anti-BmGasz 
antibody (1:400) in PBS supplemented with 1% BSA (Sigma) and 0.1% 
Triton X-100 at 4 °C overnight. Alexa Fluor 647 donkey anti-rabbit IgG 
antibody (Thermo Fisher/Invitogen) was used as the secondary anti- 
body. Images were captured using Olympus FV3000 confocal laser 
scanning system with a x 60 oil immersion objective lens (PLAPON 
60XO, NA 1.42, Olympus) and processed FV31S-SW Viewer software 
and Adobe Photoshop Elements 10. 


Construction of plasmid-based randomized sequence library 
and sequencing 

A DNA fragment containing an N35 random sequence embedded 
between two piRNA target sites was amplified by PCR using 3rand-const 
primers and synthetic N35-containing DNA oligos (IDT) (Siwi/BmAgo3- 
3endRANDSO-double) as the template (Supplementary Table 1). The 
PCR products were digested with BamHl and HindIII, and clonedinto the 
BamHI/Hindill sites of pIEx-4 vector (Millipore/Novagen). The plasmid- 
based library expresses 35-nt of random sequence flanked by target 
sites for an abundant BmAgo3- (Fig. 4a, Extended Data Fig. 5d left, and 
Supplementary Note 2) or Siwi-dominant piRNA (Extended Data Fig. 5d 
right)*. The library was transfected into (1) Tri-KO cells (mock) or Tri- 
KO cells overexpressing either (2) wild-type BmZuc and BmArmi, to 
enhance BmZuc-mediated cleavage, or (3) the catalytic mutant BmZuc 
(HN), to repress BmZuc-mediated cleavage. Transcripts derived from 
the library are expected to be sliced by complementary piRNAs bound 
to BmAgo3/Siwi, loaded into Siwi/BmAgo3 via the ping-pong pathway, 
and cleaved by BmZuc within the downstream randomized region. 
We sequenced 20-SO nt small RNAs bearing the common sequence 
in their S’ region, and restored the original sequences downstream of 
the obtained small RNAs by using the sequence data of the plasmid 
library. The variation of the randomized region was estimated to be 
215,879 for Siwi and 178,735 for BmAgo3 based on the number of dis- 
tinct sequences with RPM > 0.25 in the reference libraries. The rand- 
omized sequence libraries for reference were constructed by PCR using 
acommonreverse primer (plasmid-Hind-randR) and a specific forward 
primer (plasmid-Siwi-randF-index12 or plasmid-Ago3-randF-index19, 
shown in Supplementary Table 1). The libraries were sequenced by the 
Illumina Hiseq 3000 platform to obtain 100-nt paired-end reads using 
acustom primer containing the consensus sequence (random plasmid 
sequence primer, shown in Supplementary Table 1) and an index read 
sequence primer provided by the manufacturer. 


Small RNA library preparation 

Small RNA libraries were prepared from 20-50 nt total, Siwi-bound, 
or BmAgo3-bound RNAS, according to the Zamore lab’s open pro- 
tocol (https://www.dropbox.com/s/r5d7aj3hhyaborq/)”with some 
modifications. The 3’ adapter was conjugated with amino CA linker 
instead of dCC at the 3’ end (GeneDesign) and adenylated using 5’ 
DNA adenylation kit at the 5’ end (NEB). To reduce a ligation bias, 
four random nucleotides were included in the 3’ and 5S’ adapters 
[(S’-rApPPNNNNTGGAATTCTCGGGTGCCAAGG/amino CA linker-3’) and 
(5’-GUUCAGAGUUCUACAGUCCGACGAUCNNNN-3’)] and the adapter 
ligation was performed in the presence of 20% PEG-8000*, except for 
Fig. 1d and Extended Data Fig. 21, 3c, 5a. After the 3’ adapter ligation at 
16 °C for > 16 h, RNAs were size-selected by urea PAGE. In Fig. 1d, and 


Extended Data Fig. 21, 3c, Sa, the 3’ and 5’ adapters without the four 
random nucleotides were used. In Fig. 4e and Extended Data Fig. 5e, 
the 5’ adapter without the four random nucleotides was used. For RNA 
extraction from polyacrylamide gel, ZR small-RNA PAGE Recovery Kit 
(ZYMO Research) was used. For small RNA library preparation fromthe 
randomized sequence library (Fig. 4e, Extended Data Fig. 5a, e), specific 
forward primer (piRNA-Siwi or BmAgo3-randF, shown in Supplemen- 
tary Table 1) was used in PCR to selectively amplify the plasmid-derived 
transcripts. Small RNA libraries were sequenced using the Illumina 
HiSeq 4000 platform to obtain 50-nt single-end reads. 


Sequence analysis of endogenous small RNAs 

After removal of adapter sequences by cutadapt”’, 20-45 nt reads with- 
out any ambiguous bases were mapped to sequences of defined piRNA 
loci? with Bowtie*? allowing one mismatch. Sam files were converted 
to bam files by SAMtools* and then to bed files by BEDTools”. Length 
and 5’-end position for each piRNA were obtained from bed files using 
custom R programs. To determine Siwi- and BmAgo3-dominant loci, 
Siwi-immunoprecipitated and BmAgo3-immunoprecipitated libraries 
treated with NalO, from naive BmN4 cells were compared and defined 
Siwi-dominant piRNA loci (RPM (Siwi-IP) > RPM (BmAgo3-IP),n=1,946) 
and BmAgo3-dominant piRNA loci (RPM (BmAgo3-IP) > RPM (Siwi- 
IP), 2=1,259) (Extended Data Fig. 3a). For mouse small RNA analysis, 
piRNA lociwere defined from the deep sequencing data by Gainetdinov 
et al.*. In brief, lst-23rd sequence of each read was extracted and the 
frequency of each 1st-23rd sequence was calculated in each library 
(SRR7760309, SRR7760310, SRR7760317, SRR7760318, SRR7760321, 
SRR7760322, SRR7760343, SRR7760344, SRR7760347, SRR7760348, 
SRR7760369, SRR7760370, SRR7760373, SRR7760374, SRR7760377, 
SRR7760378). Sequences that are abundantly found (RPM >10) in at 
least one library, 36,431 in total, were selected and mapped to the mouse 
genome (GRCm38.p5) to define representative piRNA lociin pachytene 
spermatocytes. 


Sequence analysis of randomized plasmids and small RNAs 
Sequences in the randomized region of the plasmids were extracted 
and the frequency of each distinct sequence was calculated. Those 
sequences with RPM >4 (80,966 species for Siwi and 70,442 species for 
BmAgo3) were used as references. For analysing small RNAs derived 
from the randomized libraries, adapters were trimmed by cutadapt”” 
and 20-45 nt reads without any ambiguous bases but witha 15-nt com- 
mon sequence from the plasmids were mapped to the randomized 
sequence references with Bowtie” allowing no mismatch. Sam files 
were converted to bam files by SAMtools* and then to bed files by BED- 
Tools. Length and 5’-end position for each small RNA were obtained 
from bed files using custom R programs. To analyse the nucleotide 
frequencies relative to the 3’ end of small RNAs, we selected sequences 
with peak lengths of 31-44 nt, corresponding to the size range of type-E 
pre-piRNAs. 


Calculation of the similarity scores with weighted BmZuc motifs 
The top 15 highest-frequency nucleotides in the randomized Siwi- 
immunoprecipitated or BmAgo3-immunoprecipitated libraries 
(Fig. 4e) were chosen to define the ‘weighted BmZuc motif’ of 17-nt 
long (from -12 to +4, O = predicted BmZuc cleavage site) for Siwi or 
BmAgo3 (Extended Data Fig. 5g, upper), with each nucleotide having 
ascore of the log, value of the nucleotide frequency at that position, 
normalized to the randomized sequence references. The similarity 
score was calculated by summing up the weighted BmZuc motif if a 
nucleotide at a given position matches to the nucleotide at the cor- 
responding position in the defined BmZuc motif for Siwi or BnAgo3 
(Extended Data Fig. 5g, lower), by sliding the 17-nt window on each 27-nt 
sequence inthe extracted genomic sequence pools or the control shuf- 
fled pool, using custom R programs. The control shuffled sequences 
have the average nucleotide composition of the silkworm genome 


corresponding to positions 11-45 of piRNA loci (U[T]:G:C:A = 25.7:23 
.4:21.8:29.1)”. For Fig. 4f, 27-nt genomic sequence pools correspond- 
ing to positions 19-45 nt of 1,946 Siwi-dominant and 1,259 BmAgo3- 
dominant piRNA loci (Extended Data Fig. 3a) were extracted and used 
to calculate the similarity score between the weighted BmZuc motif 
for Siwi or BmAgo3 and each extracted genomic sequence from the 
Siwi- or BmnAgo3-dominant piRNA loci by a sliding window approach 
(Extended Data Fig. 5g, lower). 


Statistics and reproducibility 

Experiments in Figs. 1a, 3c—f, Extended Data Figs. 2d-g, 4d, f were inde- 
pendently performed twice with similar results. Experiments in Figs. 
2b, c,3g-i, 4d and Extended Data Figs. 2b, h, 3i, 4b, c, 5c were performed 
once. For the statistical analyses in Figs. 2a, 31, Extended Data Figs. 3f, Si, 
detailed statistical values were summarized in Supplementary Table 2. 
To estimate of effect sizes in Fig. 31, the cohensD function in the Isr 
package was used. To estimate of effect sizes in Fig. 2a and Extended 
Data Figs. 3f, 5ithe wilcoxsign_test and wilcox_test functioninthe coin 
package was used. No statistical methods were used to predetermine 
sample size. All experiments were not randomized and no blinding was 
used during data analysis. 


Reporting summary 
Further information on research design is available in the Nature 
Research Reporting Summary linked to this paper. 


Data availability 


The sequencing data reported in this paper are publicly available in 
DDBJ, under the accession number DRAOO8544. All other data are avail- 
able from the authors upon reasonable request. 
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All code required for bioinformatics analysis in this paper is available 
at https://github.com/kshoji-nt/BmZuc_cleavage. 
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Extended Data Fig. 1| The current model for the piRNA biogenesis initiated 
by the piRNA-guided PIWI-catalysed slicing in animal germ cells. The ping- 
pong cycle produces pairs of piRNAs (‘initiator’ and ‘responder’ piRNAs) that 
show 10-nt overlapping at their 5’ ends as well as multiple ‘trailing’ piRNAs 
downstream of the responder piRNAs (top)!. The ping-pong cycle is initiated by 
the slicing of a precursor transcript by an initiator piRNA-loaded PIWI protein. 
The PIWI-cleaved 5’monophosphorylated fragment is handed over toa 
corresponding PIWI protein as a pre-pre-piRNA. Then, the PIWI-loaded pre- 
pre-piRNA is endonucleolytically cleaved at a downstream position'??">, The 
resultant 5’ cleavage fragment, called a pre-piRNA, is further trimmed by the 3’- 
to-5’ exonuclease Trimmer (PNLDC1 in mouse)? ° to the mature length, 2’-O- 
methylated by the methyltransferase Henl (HENMT1in mouse)’ ’, and becomes 
aresponder piRNA (left pathway). Hen1-mediated 2’-O-methylation protects 
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mature piRNAs from degradation and tightens their binding to PIWI 
proteins’**”, The 3’ endonucleolytic cleavage fragment of the pre-pre-piRNA 
is loaded into anext PIWI protein as anew pre-pre-piRNA and 
endonucleolytically cleaved again at a downstream position, producing anew 
PIWI-loaded pre-piRNA!?">, This pre-piRNA is then processed by Trimmer and 
Hen1lat the 3’ end intoa mature trailing piRNA (middle pathway). The 3’ 
cleavage fragment of the second endonucleolytic cleavage is also loaded intoa 
next PIWI protein, serving as anew pre-pre-piRNAs. As aresult, a series of 
trailing piRNAs are consecutively produced downstream of the responder 
piRNA (right pathway)!”?348°, These two piRNA biogenesis pathways lead to 
target-dependent amplification of piRNAs (via the ping-pong cycle) and 
expansion of piRNA sequences (via trailing piRNA production). 
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Extended Data Fig. 2| Generation and characterization of Trimmer- 
knockout cells. a, Schematic representation of the domain structure of 
Trimmer and the position of the sgRNA target site for CRISPR-Cas9. b, Genomic 
PCR of aregion including the sgRNA target site. In addition to the main PCR 
product (ii), two additional PCR products (i and iii) were detected only in Tri- 
KO#4 cells. Detailed genome sequences are showninc.c, Genome sequences 
around the sgRNA target site in naive or Tri-KO#4 BmN4 cells. Genomic 
sequencing revealed various mutations at the sgRNA target site, suggesting a 
polyploid nature of the trimmer locus and/or imperfect cell cloning. d, Western 
blot analysis of Trimmer in two different Tri-KO cell lines (#4 and #6). Tri-KO line 
#4 was used in this study. e, Western blot analysis of whole-cell lysate from 
naive or Tri-KO#4 BmN4 cells. f, In vitro trimming assay for Siwi-loaded 1U50 
RNA using 1,000g ppt. from naive or two different Tri-KO cell lines. ppt., pellet. 


g, SYBR Gold staining of total RNAs from naive or three different Tri-KO cell 
lines (#3, #4 and #6). h, Total RNAs extracted from Tri-KO #4 cells 
overexpressing wild-type Trimmer (WT) or its catalytic mutant E30A (EA) were 
5’ radiolabelled and detected by phosphor imaging. Mock indicates 
transfection of acontrol plasmid. Trimmer expression was analysed by western 
blotting (upper). i, Length distribution of small RNAs mapped to 3,236 piRNA 
lociin NalO,-treated small RNA library from naive or Tri-KO BmN4 cells. 

j, Relative fraction of 2’-O-methylated Tri-KO small RNAs in each length. k, Peak 
length distribution of piRNAs mapped to 3,236 piRNA lociin the NalO,-treated 
library from naive BmN4 cells. 1, Changes by the depletion of BmZucin the 
length distribution of Type-N (lower) or Type-E (upper) NalO,-treated small 
RNAs in Tri-KO cells. 
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Extended Data Fig. 3| BmZuc is required to produce type-E pre-piRNAs for 
both Siwi and BmAgo3, whereas trailing piRNA productionis largely 
restricted to Siwi. a, Scatter plot showing normalized piRNA abundance co- 
immunoprecipitated with Siwi or BmAgo3 from naive BmN4 cells for each 
piRNA loci. Green dots, Siwi-dominant piRNA loci (n=1,946); purple dots, 
BmAgo3-dominant piRNA loci (n=1,259). b, Peak length frequency of Tri-KO 
small RNAs for Siwi-dominant (left) or BmAgo3-dominant (right) piRNA loci. 
c, Length distribution of Tri-KO small RNAs bearing the peak length of 35 or 
36 nt (type-E) for Siwi-dominant (upper) or BmAgo3-dominant (lower) piRNA 
loci. BmZuc knockdown abolished small RNAs with the peak lengths. 

Z, denotes thezscore at position n(c, e). d, Siwi-dominant type-E pre-piRNAs 


show astronger +1U preference than BmAgo3-dominant ones. e, Siwi-dominant 


type-E pre-piRNAs show a greater tendency to have downstream trailing 
piRNAs than BmAgo3-dominant ones. f, The 5’ ends of piRNAs mapped to 
20-100 nt downstream of type-N (top) or type-E (bottom) Tri-KO small RNAs 


5' end of Siwi-piRNAs mapped to the antisense strand 


were mapped on the antisense strand, separately for Siwi-dominant (left) and 
BmAgo3-dominant (right) piRNA loci. Type-N piRNAs have more antisense 
piRNAsat ~41-52 nt from the 5’ ends than type-E piRNAs, regardless of which 
PIWI protein they bind (two-sided Wilcoxon signed rank test, n=12).g, Four 
representative type-E (piRNA-1528 and 66) and type-N (piRNA-2986 and 304) 
piRNA lociand their downstream genomic regions were mapped with the 5’ 
ends of sense (grey) and antisense (red) piRNAs. h, Distribution of type-E and 
type-N piRNAs mapped toatransposon called MER8S. i, An example of mixed 
modes of pre-piRNA production. Pre-pre-piRNA-1249 contains aBmZuc 
cleavage site and a slicing site by an antisense piRNA-loaded PIWI protein. The 
~35 nt BmZuc cleavage product, but not the 59 nt slicing product, is 2’-O- 
methylated. Anunmethylated ~75 nt fragment, whichis possibly produced by 
another antisense piRNA-guided slicing, locates inan unannotated genomic 
regionand cannot beassigned. 
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Extended Data Fig. 4 | BmZuc-mediated cleavage of Siwi-loaded pre-pre- 
piRNAs in vitro. a, Detailed protocol for in vitro recapitulation of BmZuc- 
mediated cleavage of Siwi-loaded pre-pre-piRNAs. b, Siwi-loaded 5. .,)U RNA 
was incubated with Tri-KO1,000g ppt. overexpressing BmZuc WT or HN, with 
or without ATP and the ATP-regeneration system. Mock indicates transfection 
of acontrol plasmid. c, Siwi-loaded 111750 RNA was incubated with Tri-KO 
1,000g ppt. depleted of the indicated protein by RNAi. Mock indicates RNAi 
against Renilla luciferase (in c—e). d, Confocal images of BmN4 cells stably 
expressing GFP-tagged BmArmi inthe presence or absence of BmGasz (scale 
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bars, 10 zm). e, Quantitative real-time PCR analysis of BmnArmiand BmGPAT1. 
Tri-KO cells were depleted of BmGPAT1 or BmGasz by RNAi, and the mRNA 
levels for BmArmi or BmGPAT1 were analysed by real-time PCR. The graph 
shows the average of two independent experiments. f, Siwi-loaded 1USO RNA 
was incubated with Tri-KO 1,000g ppt. overexpressing BmZuc and BmArmi, or 
BmZuc HN. Mock indicates transfection of acontrol plasmid. 1US0 RNA was 
cleaved multiple sites within a region that is devoid of Uina manner dependent 
onthe BmZuc activity. ppt., pellet. 
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Extended Data Fig. 5| See next page for caption. 


Extended Data Fig. 5 | Calculation of BmZuc scores for Siwi or BmAgo3 based 
onthe randomized sequence library analysis. a, Peak length distribution of 
small RNAs derived from the randomized sequence library. b, RNA substrates 
usedinc. The top 6 nucleotides in the BmZuc motif are shown in colour and 
their mutations are shown in black. c, Siwi-loaded 111750-derived RNAs were 
incubated with Tri-KO 1,000g ppt. overexpressing BmZuc and BmArmi, or 
BmZuc(HN). Each gel image was adjusted to equalize the loading signal. ppt., 
pellet. d, Schematic representation of the randomized sequence library 
analysis for Siwi- or BnAgo3-loaded pre-piRNAs cleaved by BmZuc.e, Peak 
length distribution of Siwi- or BnAgo3-bound 2’-O-methylated small RNAs 
derived from the corresponding randomized sequence library. For Siwi 
immunoprecipitation, the same plasmid library as ina was used. f, Nucleotide 
composition around the 3’ ends of mature piRNAs in naive BmN4 cells (right) or 
type-E pre-piRNAs in Tri-KO cells (left), separately analysed for Siwi-dominant 
(top) and BmAgo3-dominant (bottom) piRNA loci. The 6 nucleotides in the 


BmZuc motifare highlighted. g, Schematic explanation for the weighted 
BmZuc motif (top) and the calculation of the BmZuc score in the 17-nt sliding 
window analysis (bottom). h, Similarity scores with the weighted BmZuc motif 
(BmZuc score) for Siwi were calculated for 111750 RNA and their mutant 
sequences in sliding windows and plotted as inc. i, Box plots show the 
maximum similarity scores with the weighted BmZuc motif for Siwi or BmnAgo3 
within the positions of 19-45 nt of Siwi-dominant (top) or BmnAgo3-dominant 
(bottom) type-N or type-E piRNA loci or the shuffled control sequences (a pool 
of 3,236 species of 27-nt scrambled sequences that have the average nucleotide 
composition of the silkworm genome). Type-E piRNA loci have significantly 
higher BmZucscores than the shuffled control sequences for both Siwi- and 
BmAgo3-dominant piRNAs (Mann-Whitney Utest). Centre line, median; box 
limits, upper and lower quartiles; whiskers, 1.5 x interquartile range; points, 
outliers. 
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Extended Data Fig. 6 | Nucleotide preference around the cleavage site by 
mouse MitoPLD. a, Peak length distribution of 2’-O-methylated small RNAsin 
wild-type (WT) or Pnidcl” mouse secondary spermatocytes. Data are from ref. 
2_b, Nucleotide composition around the 3’ end of small RNAs in WT (left) or 


(23-36 nt) 
3'-end 


Pnidc1—'— NalOg (+) 
(37-42 nt) 
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Pnidc1* mice. The 3’ ends of pre-piRNAs in Pnidcl mice exhibit strong +1U 
bias. In addition, -9A and -3C preferences, which are similar to the BmZuc 
motif (Fig. 4e), are observed in fully elongated 37-42 nt pre-piRNAsin 
Pnidcl mice (right). Data are from ref. ?. 
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The accession number for the sequencing data is DDBJ: DRAO08549. 
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Study description 
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Monument), and provide a rationale for the sample choice. When relevant, describe the organism taxa, source, sex, age range and 
any manipulations. State what population the sample is meant to represent when applicable. For studies involving existing datasets, 
describe the data and its source. 


= 
jad) 
ov} 
S 
= 
o 
= 
ia) 
nN 
© 
iad) 
= 
(i 
ar 
= 
o 
se) 
© 
S. 
=) 
a 
nN 
S 
=. 
= 
fed) 
5 
< 


Sampling strategy Note the sampling procedure. Describe the statistical methods that were used to predetermine sample size OR if no sample-size 
calculation was performed, describe how sample sizes were chosen and provide a rationale for why these sample sizes are sufficient. 


Data collection Describe the data collection procedure, including who recorded the data and how. 


Timing and spatial scale | /ndicate the start and stop dates of data collection, noting the frequency and periodicity of sampling and providing a rationale for 
these choices. If there is a gap between collection periods, state the dates for each sample cohort. Specify the spatial scale from which 
the data are taken 


Data exclusions If no data were excluded from the analyses, state so OR if data were excluded, describe the exclusions and the rationale behind them, 
indicating whether exclusion criteria were pre-established. 


Reproducibility Describe the measures taken to verify the reproducibility of experimental findings. For each experiment, note whether any attempts to 
repeat the experiment failed OR state that all attempts to repeat the experiment were successful. 


Randomization Describe how samples/organisms/participants were allocated into groups. If allocation was not random, describe how covariates were 
controlled. If this is not relevant to your study, explain why. 


Blinding Describe the extent of blinding used during data acquisition and analysis. If blinding was not possible, describe why OR explain why 
blinding was not relevant to your study. 


Did the study involve field work? [_] Yes [| No 


Field work, collection and transport 


Field conditions Describe the study conditions for field work, providing relevant parameters (e.g. temperature, rainfall). 
Location State the location of the sampling or experiment, providing relevant parameters (e.g. latitude and longitude, elevation, water 
depth). 


Access and import/export Describe the efforts you have made to access habitats and to collect and import/export your samples in a responsible manner and 
in compliance with local, national and international laws, noting any permits that were obtained (give the name of the issuing 
authority, the date of issue, and any identifying information). 


Disturbance Describe any disturbance caused by the study and how it was minimized. 


Reporting for specific materials, systems and methods 


We require information from authors about some types of materials, experimental systems and methods used in many studies. Here, indicate whether each material, 
system or method listed is relevant to your study. If you are not sure if a list item applies to your research, read the appropriate section before selecting a response. 


Materials & experimental systems Methods 
n/a | Involved in the study n/a | Involved in the study 
[| Antibodies [| ChIP-seq 
[| Eukaryotic cell lines [| Flow cytometry 
[| Palaeontology [| MRI-based neuroimaging 


[_] Animals and other organisms 


[] Human research participants 


[| Clinical data 


Antibodies 


Antibodies used [Primary antibodies] 
Anti-Siwi, BmAgo3, BmZuc, BmArmi, BmGPAT1, BmGasZ, Trimmer, BmPapi antibodies were generated by immunizing rabbits. 
Anti-FLAG (M2, Sigma #F1804), Actin (Santa Cruz #sc-1616), GFP (B-2, Sabta Cruz#sc-9996) antibodies were purchased. 
For Westernblotting, each antibody was used at the following dilution [anti-Siwi (1:2000), BmAgo3 (1:2000), BmZuc (1:200), 
BmArmi (1:1000), BmGPAT1 (1:2000), BmGasZ (1:2000), Trimmer (1:1000), BmPapi (1:1000), FLAG (1:2000), Actin (1:2000), GFP 
(1:2000)]. For immunofluorescence, anti-BmGasZ antibody was used at 1:400. 
[Secondary antibodies] 
Peroxidase AffiniPure F(ab'), Fragment Goat Anti-Rabbit IgG, Fc fragment specific (Jackson ImmunoResearch #111-036-046 ; 
1:5000) 
Peroxidase AffiniPure Goat Anti-Mouse IgG, light chain specific (Jackson ImmunoResearch #115-035-174 ; 1:5000) 
Peroxidase AffiniPure Donkey Anti-Goat IgG (H+L) (Jackson ImmunoResearch #705-035-003 ; 1:5000) 
Alexa Fluor 647 donkey anti-rabbit IgG antibody (ThermoFisher/Invitrogen # A-31573; 1:4000) 


Validation [Primary antibodies] 
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Anti-Siwi, BmAgo3, BmZuc, BmArmi, BmGPAT1, BmGasZ antiboies were validated by Western blotting of a knockdown lysate of 
BmN4 (silkworm) in this paper (Fig. 3f). Anti-Trimmer, BmPapi antiboies were validated by Western blotting of a knockdown 
lysate of BmN4 (silkworm) in a previous study (2016 Cell 164, 962-973 Izumi et al.). 

For the following purchased antibodies, see manufacturer information: 

Anti-FLAG antibody (https://www.sigmaaldrich.com/content/dam/sigma-aldrich/docs/Sigma/Bulletin/f1804bul.pdf) 

Anti-GFP antibody (https://datasheets.scbt.com/sc-9996.pdf) 

Anti-Actin antibody (https://search.cosmobio.co.jp/cosmo_search_p/search_gate2/docs/SCB_/SC1616.20070823.pdf) 

species: mouse, rat, human, zebrafish, C. elegans, Drosophila, S. cerevisiae and Xenopus 

applications: Western blotting, immunoprecipitation, immunofluorescence, and flow cytometry 

[Secondary antibodies] 

Peroxidase AffiniPure F(ab'), Fragment Goat Anti-Rabbit IgG, Fc fragment specific (https://www.jacksonimmuno.com/catalog/ 
products/111-036-046) 

Peroxidase AffiniPure Goat Anti-Mouse IgG, light chain specific (https://www.jacksonimmuno.com/catalog/ 
products/115-035-174) 

Peroxidase AffiniPure Donkey Anti-Goat IgG (H+L) (https://www.jacksonimmuno.com/catalog/products/705-035-003) 

Alexa Fluor 647 donkey anti-rabbit IgG antibody (https://www.thermofisher.com/antibody/product/Donkey-anti-Rabbit-lgG-H-L- 
Highly-Cross-Adsorbed-Secondary-Antibody-Polyclonal/A-31573) 
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Eukaryotic cell lines 


Policy information about cell lines 


Cell line source(s) BmN4 cells are provided from Dr. Kusakabe, Kyushu University 
Authentication BmN4 cells have not been authenticated. 
Mycoplasma contamination Not tested 


Commonly misidentified lines No commonly misidentified cell lines were used. 
(See ICLAC register) 


Palaeontology 


Specimen provenance Provide provenance information for specimens and describe permits that were obtained for the work (including the name of the 
issuing authority, the date of issue, and any identifying information). 


Specimen deposition Indicate where the specimens have been deposited to permit free access by other researchers. 
Dating methods If new dates are provided, describe how they were obtained (e.g. collection, storage, sample pretreatment and measurement), 
where they were obtained (i.e. lab name), the calibration program and the protocol for quality assurance OR state that no new 


dates are provided. 


[| Tick this box to confirm that the raw and calibrated dates are available in the paper or in Supplementary Information. 


Animals and other organisms 


Policy information about studies involving animals; ARRIVE guidelines recommended for reporting animal research 


Laboratory animals For laboratory animals, report species, strain, sex and age OR state that the study did not involve laboratory animals. 


Wild animals Provide details on animals observed in or captured in the field; report species, sex and age where possible. Describe how animals 
were caught and transported and what happened to captive animals after the study (if killed, explain why and describe method, if 
released, say where and when) OR state that the study did not involve wild animals 


Field-collected samples For laboratory work with field-collected samples, describe all relevant parameters such as housing, maintenance, temperature, 
photoperiod and end-of-experiment protocol OR state that the study did not involve samples collected from the field. 


Ethics oversight Identify the organization(s) that approved or provided guidance on the study protocol, OR state that no ethical approval or 
guidance was required and explain why not. 


Note that full information on the approval of the study protocol must also be provided in the manuscript. 


Human research participants 


Policy information about studies involving human research participants 


Population characteristics Describe the covariate-relevant population characteristics of the human research participants (e.g. age, gender, genotypic 
information, past and current diagnosis and treatment categories). If you filled out the behavioural & social sciences study design 
questions and have nothing to add here, write "See above." 


Recruitment Describe how participants were recruited. Outline any potential self-selection bias or other biases that may be present and how 
these are likely to impact results. 


Ethics oversight Identify the organization(s) that approved the study protocol. 


Note that full information on the approval of the study protocol must also be provided in the manuscript. 


Clinical data 


Policy information about clinical studies 
All manuscripts should comply with the ICMJE guidelines for publication of clinical research and a completed CONSORT checklist must be included with all submissions. 
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Clinical trial registration Provide the trial registration number from ClinicalTrials.gov or an equivalent agency. 

Study protocol Note where the full trial protocol can be accessed OR if not available, explain why. 

Data collection Describe the settings and locales of data collection, noting the time periods of recruitment and data collection. 

Outcomes Describe how you pre-defined primary and secondary outcome measures and how you assessed these measures. 
ChIP-seq 


Data deposition 
[| Confirm that both raw and final processed data have been deposited in a public database such as GEO. 


[| Confirm that you have deposited or provided access to graph files (e.g. BED files) for the called peaks. 


Data access links For "Initial submission" or "Revised version" documents, provide reviewer access links. For your "Final submission" document, 
May remain private before publication. provide a link to the deposited data. 

Files in database submission Provide a list of all files available in the database submission. 

Genome browser session Provide a link to an anonymized genome browser session for "Initial submission" and "Revised version" documents only, to 
(e.g. UCSC) enable peer review. Write "no longer applicable" for "Final submission" documents. 


Methodology 


Replicates Describe the experimental replicates, specifying number, type and replicate agreement. 


Sequencing depth Describe the sequencing depth for each experiment, providing the total number of reads, uniquely mapped reads, length of 
reads and whether they were paired- or single-end. 


Antibodies Describe the antibodies used for the ChIP-seq experiments; as applicable, provide supplier name, catalog number, clone 
name, and lot number. 


Peak calling parameters Specify the command line program and parameters used for read mapping and peak calling, including the ChIP, control and 
index files used. 


Data quality Describe the methods used to ensure data quality in full detail, including how many peaks are at FDR 5% and above 5-fold 
enrichment. 


Software Describe the software used to collect and analyze the ChiP-seq data. For custom code that has been deposited into a 
community repository, provide accession details. 


Flow Cytometry 
Plots 


Confirm that: 


[| The axis labels state the marker and fluorochrome used (e.g. CD4-FITC). 


| |The axis scales are clearly visible. Include numbers along axes only for bottom left plot of group (a 'group' is an analysis of identical markers). 


| |All plots are contour plots with outliers or pseudocolor plots. 


A numerical value for number of cells or percentage (with statistics) is provided. 


Methodology 


Sample preparation Describe the sample preparation, detailing the biological source of the cells and any tissue processing steps used. 


Instrument Identify the instrument used for data collection, specifying make and model number. 


Software Describe the software used to collect and analyze the flow cytometry data. For custom code that has been deposited into a 
community repository, provide accession details. 


Cell population abundance | Describe the abundance of the relevant cell populations within post-sort fractions, providing details on the purity of the samples 
and how it was determined. 


Gating strategy Describe the gating strategy used for all relevant experiments, specifying the preliminary FSC/SSC gates of the starting cell 
population, indicating where boundaries between "positive" and "negative" staining cell populations are defined. 


[| Tick this box to confirm that a figure exemplifying the gating strategy is provided in the Supplementary Information. 


Magnetic resonance imaging 


Experimental design 


Design type Indicate task or resting state; event-related or block design. 


= 
S 
= 
a) 
= 
ia) 
Nn 
a) 
iad) 
= 
(im 
=a 
= 
@) 
se) 
fe) 
Ze 
=) 
© 
nN 
S 
S 
S 
fed) 
= 
< 


Design specifications Specify the number of blocks, trials or experimental units per session and/or subject, and specify the length of each trial 
or block (if trials are blocked) and interval between trials. 


Behavioral performance measures _| State number and/or type of variables recorded (e.g. correct button press, response time) and what statistics were used 
to establish that the subjects were performing the task as expected (e.g. mean, range, and/or standard deviation across 


subjects). 
Acquisition 

Imaging type(s) Specify: functional, structural, diffusion, perfusion. 

Field strength Specify in Tesla 

Sequence & imaging parameters Specify the pulse sequence type (gradient echo, spin echo, etc.), imaging type (EPI, spiral, etc.), field of view, matrix size, 
slice thickness, orientation and TE/TR/flip angle. 

Area of acquisition State whether a whole brain scan was used OR define the area of acquisition, describing how the region was determined. 

Diffusion MRI |_| Used [| Not used 


Preprocessing 


Preprocessing software Provide detail on software version and revision number and on specific parameters (model/functions, brain extraction, 
segmentation, smoothing kernel size, etc.). 


Normalization If data were normalized/standardized, describe the approach(es): specify linear or non-linear and define image types 
used for transformation OR indicate that data were not normalized and explain rationale for lack of normalization. 


Normalization template Describe the template used for normalization/transformation, specifying subject space or group standardized space (e.g. 
original Talairach, MNI305, ICBM152) OR indicate that the data were not normalized. 


Noise and artifact removal Describe your procedure(s) for artifact and structured noise removal, specifying motion parameters, tissue signals and 
physiological signals (heart rate, respiration). 


Volume censoring Define your software and/or method and criteria for volume censoring, and state the extent of such censoring. 


Statistical modeling & inference 


Model type and settings Specify type (mass univariate, multivariate, RSA, predictive, etc.) and describe essential details of the model at the first 
and second levels (e.g. fixed, random or mixed effects; drift or auto-correlation). 


Effect(s) tested Define precise effect in terms of the task or stimulus conditions instead of psychological concepts and indicate whether 
ANOVA or factorial designs were used. 


Specify type of analysis: [ ]Wholebrain [| ROl-based [| Both 


Statistic type for inference Specify voxel-wise or cluster-wise and report all relevant parameters for cluster-wise methods. 
(See Eklund et al. 2016) 
Correction Describe the type of correction and how it is obtained for multiple comparisons (e.g. FWE, FDR, permutation or Monte 


Carlo). 


Models & analysis 


n/a | Involved in the study 
[| [| Functional and/or effective connectivity 


[| ies Graph analysis 


[| [| Multivariate modeling or predictive analysis 


Functional and/or effective connectivity Report the measures of dependence used and the model details (e.g. Pearson correlation, partial 
correlation, mutual information). 


Graph analysis Report the dependent variable and connectivity measure, specifying weighted graph or binarized graph, 
subject- or group-level, and the global and/or node summaries used (e.g. clustering coefficient, efficiency, 
etc.). 


Multivariate modeling and predictive analysis Specify independent variables, features extraction and dimension reduction, model, training and evaluation 
metrics. 
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Axel Mogk? & Sander J. Tans'** 


The ability to reverse protein aggregation is vital to cells'”. Hsp100 disaggregases 
suchas CIpB and Hsp104 are proposed to catalyse this reaction by translocating 
polypeptide loops through their central pore®*. This model of disaggregation is 
appealing, as it could explain how polypeptides entangled within aggregates can be 
extracted and subsequently refolded with the assistance of Hsp70*°. However, the 
model is also controversial, as the necessary motor activity has not been identified®® 
and recent findings indicate non-processive mechanisms suchas entropic pulling or 
Brownian ratcheting”°. How loop formation would be accomplished is also obscure. 
Indeed, cryo-electron microscopy studies consistently show single polypeptide 
strands in the Hsp100 pore”. Here, by following individual ClpB-substrate 
complexes in real time, we unambiguously demonstrate processive translocation of 
looped polypeptides. We integrate optical tweezers with fluorescent-particle tracking 
to show that CIpB translocates both arms of the loop simultaneously and switches to 
single-arm translocation when encountering obstacles. CIpB is notably powerful and 
rapid; it exerts forces of more than 50 pN at speeds of more than 500 residues 

per second in bursts of up to 28 residues. Remarkably, substrates refold while exiting 
the pore, analogous to co-translational folding. Our findings have implications for 
protein-processing phenomena including ubiquitin-mediated remodelling by Cdc48 


(or its mammalian orthologue p97)” and degradation by the 26S proteasome”. 


We studied the disaggregase CIpB, a member of the Hsp100 chaper- 
one family, using single-molecule techniques. Maltose-binding pro- 
tein (MBP) was coupled to DNA handles at both termini and tethered 
between polystyrene beads, which were trapped and manipulated with 
optical tweezers (Fig. 1a). After mechanical unfolding of the protein 
(Fig. 1a, Extended Data Fig. 1a), the applied force was reduced toa value 
between 5 and 10 pN, high enough to prevent spontaneous refolding 
(Fig. 1a). Addition of ATP and ClpB(Y503D)—a mutant altered in the 
regulatory middle (M) domain that does not require Hsp70 (DnaK) 
binding for ATPase activation’’—resulted in isolated episodes of con- 
traction inthe bead-to-bead distance (Fig. 1b). Zooming in showed that 
the effective polypeptide contour lengthL, was initially approximately 
360 amino acids (aa), as expected for fully unfolded MBP, and then 
decreased linearly to O, indicating that the C and N termini of MBP 
were directly adjacent to each other (Fig. 1c, Extended Data Fig. 1b). 
After a brief pause, L, increased abruptly back to 360 residues and then 
immediately decreased again (Fig. 1c). ClpB thus produced processive 
substrate translocation runs that ended with a loss of CIpB grip. This 
in turn caused the substrate to slip and be pulled back by the applied 
force and hence enabled anew run to start. 

Translocation was abolished by using ADP instead of ATP; when 
either of the two CIpB ATPase catalytic centres (E279A or E678A) were 
mutated, preventing ATP hydrolysis; when the substrate-contacting 
pore loops (Y251A or Y653A) were mutated; or by deletion of the 


N-terminal domain that forms the pore entry. Translocation was also 
observed for the M-domain mutant K476C and for wild-type ClpB 
with the Hsp70 (Dnak in bacteria) system’* (Extended Data Fig. 2a-e). 
ClpB(K476C) and wild-type CIpB translocated at the same speed as 
ClpB(Y503D), which was unexpected because both stimulated ATP 
hydrolysis less strongly in bulk® (Extended Data Fig. 2f, g). However, 
they exhibited translocation for a smaller proportion of the time 
(Extended Data Fig. 2b), suggesting that the differences in hydrolysis 
rates reflected the fraction of actively translocating ClpB hexamers 
rather than their individual translocation speed. 

Longer polypeptide constructs of two and four tandem MBP repeats 
displayed longer runs before slipping, with some exceeding 1,000 
residues (Extended Data Figs. Ic-f, 3, 4). The speed distribution 
displayed two peaks (the second at roughly double the speed of the 
first) and extended beyond 500 aa per second (Fig. 1d), more than 
tenfold faster than other peptide translocases” ’. This distribution 
appeared similar for the different substrate constructs and for differ- 
ent individual translocation bursts (Fig. 1b, Extended Data Fig. 3d-f). 
These bursts probably reflected the activity of single ClpB hexamers, 
because they consisted of continuous run-slip-run activity and were 
spaced apart by several seconds. Without slowing down, CIpB exerted 
high forces of more than 50 pN, resulting in the melting of our DNA 
tethers (Fig. le). These data indicated notable speed, processivity 
and power. 


"AMOLF, Amsterdam, The Netherlands. Center for Molecular Biology of Heidelberg University, German Cancer Research Center, Heidelberg, Germany. “Department of Bionanoscience, Kavli 
Institute of Nanoscience Delft, Delft University of Technology, Delft, The Netherlands. *e-mail: s.tans@amolf.nl 
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Fig. 1| ClpBis a processive translocase. a, Tethered MBP was unfolded with 
optical tweezers, relaxed to alow force that prohibits refolding, and exposed to 
ClpB and ATP.b, Tether contraction bursts (orange regions) with CIpB(Y503D) 
and ATP. Grey, raw signal (500 Hz); red, filtered signal (2 Hz).c, Polypeptide 
contour lengthL, during a contraction burst, as determined from the bead- 
bead distance (D), force, and worm-like chain model. L, decreases linearly from 
360 aa (MBP fully extended) downto 0 aa (MBP C andN termini directly 
adjacent), indicating processive translocation by CIpB. Abrupt increases of L. 
to 360 aaindicate that CIpB transiently loses grip and substrate slips 
backwards, pulled by the applied force. Red, filtered signal (20 Hz).d, Speed 


Different hypothetical translocation models or topologies could 
be considered (Fig. 1f). Even when their termini are not free for inser- 
tion, as is the case here, single polypeptide chains can be accommo- 
dated into the CIpB pore by rings that open and close or that assemble 
around them” (model 1). However, this scenario would only produce 
the observed contractions if a second chain site is immobilized on 
ClIpB (model II), analogous to DNA processing by condensin”. Alter- 
natively, the substrate could be inserted as a loop into the central 
pore, with translocation of one (model III) or both (model IV) arms 
of the loop. 

Testing these models with optical tweezers is difficult. We there- 
fore developed a technique that allows independent measurement of 
the length of each arm of the polypeptide loop and integrates optical 
tweezers with CIpB tracking at sub-wavelength resolution using single- 
molecule fluorescence imaging (Fig. 2a, Extended Data Fig. 5). We chose 
the construct with two maltose-binding protein (MBP) repeats (2MBP), 
as it yields longer runs, and exposed it to fluorescently labelled ClpB and 
ATP after unfolding, while scanning a confocal excitation beam along 
the tether and beads (Fig. 2b). To limit the parasitic signal emanating 
from the beads, we developed a protein-DNA coupling protocol that 
enabled the attachment of long 5-kilobase pair (kbp) DNA handles 
(see Methods). Single ClpB-binding events were identified by a fluores- 
cent spot appearing between the beads (Fig. 2b), and translocation was 
observed soon after (Fig. 2c). We next moved to an ATP-only solution to 
reduce background fluorescence and prevent further ClpB binding, and 
tracked the spot position using Gaussian fitting (Fig. 2d-f). Combining 
the tweezers and tracking data yielded the distances between CIpB and 
each of the MBP termini, and hence the translocation activity on both 
loop arms independently (Fig. 2a, Methods). 

We found various sequences of events: after translocation of the 
entire chain, the left arm of the loop was released and slipped backwards 
until the full chain was again extended in cis, and subsequently left-arm 
translocation restarted rapidly (Fig. 2g, h, event sequence A>B>C). 
A similar sequence on the right side occurred directly afterwards 
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distribution of runs from all MBP substrate constructs in the presence of 
ClpB(Y503D) and ATP, ata force of approximately 8 pN. Double Gaussian fit 
shows two mean speeds, v, ~ 2u, withv,=240 +30 aastandv,=450+130aas? 
(mean+s.d.,n=800runs, 18 molecules). e, Mean translocation speed for 
ClpB(Y503D) versus applied tension (n=717 runs, 8 molecules; see Methods). 
Grey, DNA melting regime and upper force limit. Data are mean+s.e.m. 

f, Hypothetical CIpB translocation topologies. Single-strand insertion and 
translocation (I) does not yield contraction, unless it is immobilized elsewhere 
onthe ClpB surface (II). Dual-strand insertion ina looped topology can also 
produce contraction, either by single-arm (III) or dual-arm (IV) translocation. 


(Fig. 2g, h, A>D>E). We also observed both arms being translocated 
simultaneously, each at similar velocity (Fig. 2g, i, event F, Extended 
Data Fig. 5i-k). Consistently, the total translocation speed, which 
reflects the velocity at which both polypeptide termini approach each 
other, and is more accurate as it is based only onthe signal from the opti- 
cal tweezers, was then twice as high (2v) asin single-arm translocation 
runs (v) (Fig. 2k, grey region). Model II does not allow for two-arm trans- 
location and hence was not consistent with the data, whereas models 
Ill and IV were consistent with the data. Switches between single- and 
two-arm translocation modes took place after blockage of one arm, 
typically on ClpB encountering the DNA tether at either terminus. The 
data also provided direct confirmation that single CIpB rings remained 
intact and bound during runs, switches and back-slips (Fig. 2h, i). 

This scenario was supported by multiple additional observations. 
First, 64% of the very first runs ina translocation burst initially showed 
the higher speed (2v) before switching to the lower speed (v), com- 
pared with 22% when considering all runs (Extended Data Fig. 6a). 
Indeed, the initial ClpB binding site is probably not directly adjacent 
to the DNA handles at the termini, and thus both arms are then unob- 
structed when translocation starts. Initial binding regions estimated 
from these experiments were consistent with peptide scanning data, 
although we note that both methods yield rough estimates (Extended 
Data Fig. 6). Second, experiments at increased resolution showed that 
lower-speed (v) runs were composed of individual translocation steps 
of 14.6 +0.9 aa, whereas higher-speed (2v) runs were in steps of 28+3 aa 
(Fig. 3a—d, Extended Data Fig. 7a—e). These findings are consistent, 
since decreases in distance between termini should be twofold larger 
when both arms are translocated simultaneously. ClpB thus switches 
between translocation modes by changing the step size rather than 
the step frequency (Fig. 3e). 

We next investigated how these stepping dynamics relate to the 
structure of ClpB. Each CIpB monomer is thought to move substrates 
by approximately 2 aa, substantially less than the observed 14-aa or 
28-aa steps”. A proposed concerted action” of all subunits together 
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Fig. 2| Optical tweezers with fluorescence reveals CIpB 
, Yar translocation of bothloop arms. a, Principle of 
approach: onetrap is continuously moved to maintain 
force constant. Bead positions yield polypeptide N-to-C- 
terminal distance at nanometre precision (expressed in 


contour lengthL,). Confocal fluorescence imaging of 
ClpB-Atto633 yields its position at sub-wavelength 
precision using Gaussian fitting. Together, they quantify 
thelengths of bothnon-translocated (cis) polypeptide 

D arms:L, (blue) and _Z, (purple). b, Fluorescence 
kymograph from scans along beads and tether, showing 
CIpB binding (blue arrow) and movement tothe CIpB- 
freeregion.c, Concurrent tweezers data of polypeptide 
contour lengthL,, showing translocation start soon after 
ClIpB binding. d, Kkymograph during translocation. 

e, Photon count of CIpB spot along twoscans and 
Gaussian fits that determine position. f, Position of ClpB- 
Atto633 (innumber of pixels), moving suddenly down at 
back-slip B (h) and gradually up during translocation C. 
Back-slip D does not change CIpB position, because slip is 
onthe right (blue), and left-arm linked to stationary bead 
remains unchanged (purple). Topline, ClpB at left-hand 
MBPterminus; bottomline: ClpBis at right-hand 
terminus and polypeptide is fully in cis. Consistently, 
CIpB deviates from top line when tweezers detects back- 


slip. g, Cartoonsindicating positions corresponding to 
plotsinhandi.A, polypeptideis fully in trans (L.=0); 
BandD, back-slip of left and right arm, respectively. C 
andE, translocation ofleft and right arm, respectively. 
F, translocation of botharms.h,i,L,and L, for 
kymographs shown ind and Extended Data Fig. 5g. 


Grey-shaded region, double-speed translocation. 
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Consistently, both arms shorten simultaneously. 


2v 


Time Time 
would yield a continuous series of 2-aa steps, and thus appears incon- 
sistent with these results. By contrast, the six subunits acting in rapid 
consecutive manner would produce steps similar to those detected 
here (approximately 12 aa, or approximately 24 aa when both arms are 
moving). The pauses between steps could thus reflect a slow transition 
within the ATP cycle occurring in all subunits”. Zooming into the steps 
should show six substeps of around 2 aa, but these cannot be resolved 
owing to the particularly high translocation speed and the need totime- 
average at these length scales. Therefore, we mixed ATP with the poorly 
hydrolysable ATPYS, as this would be expected to interrupt sequential 
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Fig.3| Translocation steps by CIpB. a, b, Plot of L. for single-speed (a) 

and dual-speed (b) translocation runs. Red, Savitzky-Golay filtering. 

c,d, Distribution of L, difference between any two points for one single-speed 
run (c) or one dual-speed run (d). The regularly spaced peaks indicate a step 
size of 14.6 + 0.9 aa for the single-speed run; the peak spacing for the dual-speed 
runis doubled, yielding a step-size of 28 +3 aa. Data are mean+s.e.m. 
calculated fromn=12runs. e, The data show that the speed is doubled by 
doubling the step size (red), not the step frequency (blue). 


0 jk, Total cis-polypeptidelength,L.=L,+L, from 
tweezers alone. 


subunit action moving along the ClpB ring and therefore yield smaller 
steps. Translocation was more erratic and indeed showed smaller steps 
well below 14 aa in size (Extended Data Fig. 7g-i), rather than longer 
pauses only. These data thus supported the sequence-pause model. 

The complex dynamics observed thus far can be further compli- 
cated when folded structures are present within the looped polypep- 
tide. Specifically, we found back-slips for 2MBP that were incomplete, 
with a segment of about 270 aa remaining on the trans side of CIpB 
(Fig. 4a, b). This is exactly the length of one MBP core, suggesting that 
the polypeptide folded after translocation (in line with its normal fold- 
ing time of about 1s)*° and was subsequently blocked at the trans side 
of the narrow CIpB pore when pulled backwards during a back-slip 
(Fig. 4d). Consistently, such incomplete back-slips only occurred after 
full MBP cores were translocated (Fig. 4c) and were not observed for 
folding-compromised 2MBP mutants” or for IMBP, whose core cannot 
fold because a key segment remains stuck in the CIpB pore (Extended 
Data Fig. 8). Of note, substrates refolded at the exit of the ClpB channel, 
analogous to co-translational folding of nascent chains, and without 
requiring DnaK. 

Conversely, misfolded structures already present within the chain 
should be blocked at the cis side of ClpB during translocation. Such 
an obstruction of translocation can ramp up local forces and in turn 
pull apart the blocking misfolded structure. Indeed, we observed such 
disruption when a partially folded MBP or asmall MBP aggregate was 
exposed to CIpB (Extended Data Fig. 9). The disruption events were 
directly followed by translocation runs because unfolded polypeptides 
produced by structure disruption on the cis side of CIpB are available 
for translocation. These data indicate how folded structures present 
incis and trans can affect translocation dynamics in alooped topology. 

In conclusion, our study on CIpB shows unambiguously that poly- 
peptide loop extrusion is possible. Free substrate termini may also 
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insert into the CIpB pore and be translocated ina non-loop topology, 
though we surmise that internal segments of aggregated proteins 
are targeted more readily and hence translocated as loops. CIpB is 
fast, processive, generates large forces, and can switch between 
single- and dual-arm translocation. ClpB thus appears to maintain 
a tight and long-term grip on both arms, with back-slips indicat- 
ing a sporadic loss of contact. It remains an open question how the 
independent handling of two arms is achieved at the structural 
level. These features of CIpB are relevant to efficient disaggrega- 
tion (Extended Data Fig. 10). Full dissolution of stable aggregates 
probably involves multiple CIpB rings and other chaperones such as 
Hsp70/Dnak acting at different sites, at different moments in time, 
and involving many random dissociation and re-association events. 
Nevertheless, CIpB translocation itself is remarkably deterministic 
and processive once started. 

Overall, our findings define loop extrusion as the mechanistic basis 
of Hsp100 disaggregation, highlight the need for tight regulation of 
Hsp100 activity and suggest that other polypeptide processing sys- 
tems such as the Cdc48 (mammalian orthologue p97) segregase, the 
ribosomal assembly factor Rix7 (mammalian orthologue NVL), and 
the 26S proteasome may also exploit the capability to handle multiple 
polypeptide strands ina controlled manner. 
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Methods 


No statistical methods were used to predetermine sample size. The 
experiments were not randomized. The investigators were not blinded 
to allocation during experiments and outcome assessment. 


Protein expression and purification 

All MBP constructs were modified at both termini with cysteine residues 
using the pET28 vector. Double-mutant MBP harbours two mutations 
(V8G and Y283D) that hinder folding”*®. Proteins were purified from 
Escherichia coliBL21(DE3) cells. For overexpression, overnight cultures 
were diluted 1:100 in fresh LB medium supplemented with 50 mgI* 
kanamycin, 0.2% glucose and incubated under vigorous shaking at 
30 °C. Expression was induced at OD,o) = 0.6 by addition of 1mM IPTG 
and incubation overnight at room temperature. Cells were cooled, col- 
lected by centrifugation at 5000g for 20 min, flash-frozen and stored 
at —80 °C. Cell pellets were resuspended in ice-cold buffer A (SO mM 
potassium phosphate pH 7.5, 150 mM NaCl, 3 mM chloramphenicol, 
50 mM Glu-Arg, 10 mM Complete Protease Inhibitor Ultra (Roche), 
10 mM EDTA) and lysed using a pressure homogenizer. The lysate was 
cleared from cell debris by centrifugation at 50,000g for 60 min and 
incubated with Amylose resin (New England Biolabs) that was previ- 
ously equilibrated in buffer A for 20 min at 4 °C. The resin was washed 
with buffer A three times by centrifugation and bound proteins were 
eluted in buffer Asupplemented with 20 mM maltose. Purified proteins 
were aliquoted, flash-frozen in liquid nitrogen and stored at —80 °C. 
ClpB and variants were overexpressed in F. coli AclpB.:kan cells. Cell 
pellets were resuspended in LEW buffer (SO mM NaH,PO, pH 8.0, 300 
mM NaCl, 5mM B-mercaptoethanol) and lysed by French press. Cleared 
supernatants were incubated with Protino Ni-IDA resin and bound 
proteins were eluted by LEW buffer containing 250 mM imidazole. 
ClpB containing fractions were pooled and subjected to Superdex S200 
16/60 size-exclusion chromatography in MDH buffer (50 mM Tris pH 
7.5,150 mM KCI,20 mM MgCl, 2mM DTT) containing 5% (v/v) glycerol. 


ClpB-Atto633 labelling 

Labelling of ClpB-E731C variants with Atto633-maleimide was per- 
formed in PBS buffer according to the instructions of the manufacturer 
(ATTO-TEC). Labelled ClpB-E731C was separated from non-reacted 
Atto633 by size-exclusion chromatography using Superdex S200 
HR10/30 in MDH buffer containing 5% (v/v) glycerol. 


Attachment of DNA handles to substrates 

Protein substrates were buffer-exchanged using a PD-10 desalting col- 
umn (GE Healthcare) to remove reducing agents and elutants. Next, 
they were incubated with a 4x excess maleimide-modified oligonucleo- 
tides (20 nucleotides) for 1h at 30 °C. Uncoupled oligos were removed 
using Amylose resin (NEB). The coupled protein was then ligated to 
2.5-kbp DNA tethers presenting acomplementary 20-nucleotide single- 
stranded overhang using T4 ligase for 1h at room temperature. 


Optical tweezers assay 

Carboxyl polystyrene beads (CP-20-10, diameter 2.1 um, Sphero- 
tech) were covalently coated with sheep anti-digoxigenin antibody 
(Roche) via a carbodiimide reaction (PolyLink Protein coupling kit, 
Polysciences). Approximately 50 ng of the generated construct were 
incubated with 2 pl beads in 10 pl buffer C (SO mM HEPES pH 7.5,5 mM 
MgCl,, 100 mM KCl) for 15 minina rotary mixer at 4 °C and rediluted in 
350 pl buffer C. With our coupling strategy, approximately 50% of the 
constructs were asymmetrically functionalized with digoxigenin and 
biotin in each side. In order to create the second connection, we used 
NeutrAvidin-coated polystyrene beads (NVP-20-5, diameter 2.1 1m, 
Spherotech). Once trapped, beads were brought into close proximity 
to allow binding, and tether formation was identified by an increase in 
force when the beads were moved apart. CIpB was diluted in buffer C 


toa final concentration of 2 pM. For the ATP experiments, we used an 
ATP regeneration system (3 mM ATP, 20 ng pI‘ pyruvate kinase, 3 mM 
phosphoenol pyruvate). Experiments were performed in the presence 
of an oxygen scavenging system”? (3 units per ml pyranose oxidase, 90 
units per ml catalase and 50 mM glucose, all purchased from Sigma- 
Aldrich) to prevent DNA and protein oxidation damage. 


Single-molecule data analysis and CIpB translocation event 
characterization 

Data was recorded at 500 Hz using a custom-built dual trap optical twee- 
zers anda C-Trap (Lumicks). Data was analysed using custom scriptsin 
Python. The optical traps were calibrated using the power spectrum of 
the Brownian motion of the trapped beads”*, obtaining average stiffness 
values of k= 0.39 + 0.04 pN nm”. Most measurements were taken in 
an active force-clamp regime, in which one of the traps was moved in 
response to changes in the force using a proportional-integral-deriva- 
tive (PID) feedback loop (Extended Data Fig. 5e, f). Individual force- 
extension curves were identified and fitted to two worm-like-chain 
(WLC) models in series (Extended Data Fig. 1a), using the approximation 
of an extensible polymer reported in ref. *° for the DNA, and the Odijk 
inextensible approximation for the protein contribution™. 
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Where B* = in (2, Fis the force, Tis the temperature andL,*,KandZ,*are 
the persistence length, stretch modulus and contour length of DNA, 
respectively; B= gr whereL, andl, are the persistence and extended 
length of the protein, respectively, L.* values were 906, 1,750 and 
3,500 nm for the three different DNA handles used (1.3, 2.5 and 5 kb, 
respectively), andZ, was 0.75 nm.L,*and K were fitted, yielding average 
values of 30 nmand 700 pNnm_‘, respectively. These fitted parameters 
were then used to compute the instantaneous extended length of the 
protein (L,) using the same WLC model (Extended Data Fig. 1b). The 
translocated length (L,) was computed by subtracting the extended 
length (L,) to the total contour length of the protein (L.). The unfiltered 
data (500 Hz) is displayed in all panels in grey. With the exception of 
Fig. 1b, the red signal always indicates data decimated to 20 Hz. 

To classify translocation events, the translocated length signal was 
smoothed using a Savitzky-Golay filter” (Extended Data Fig. 4c, black 
line), enabling its time derivative to be calculated without large fluctua- 
tions. Back-slipping results in a large negative slope in the derivative, 
which was used as the criteria to separate individual translocation 
runs (Extended Data Fig. 4d). Subsequent one-dimensional dilation 
and erosion was performed to remove artefacts. Next, each individual 
run was similarly treated in order to find local changes in the slope 
(Extended Data Fig. 4e), setting as threshold a value between the two 
known speeds (around 70 and 140 nms“, Extended Data Fig. 4f). Linear 
fits were performed in each identified region and reported as the local 
translocation speed (Extended Data Fig. 4e). Only fits that yielded r 
values higher than 0.8 were considered unless otherwise stated. Speed 
distributions were computed using the speeds of all valid runs for each 
condition. 


Translocation-step characterization 

To increase the spatial resolution”, we tethered a single MBP using 
1,300-bp DNA handles, 500 uM ATP and high tension (>20 pN). Raw 
data was smoothed using a Savitzky—Golay filter of Sth order witha 
window of 21 data points. The difference between every distinct pair 
of data points was calculated and the sample was binned to compute 
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the pairwise length distribution. To calculate the periodicity more 
accurately, we computed the autocorrelation of the pairwise length 
distribution using the Pearson correlation coefficient for different 
lag lengths (Extended Data Fig. 7a, b) and its power spectrum using 
the Welch method* (Extended Data Fig. 7c, d). The autocorrelation 
distribution was fitted using the equation: 


c cos{ | +mxt+ n|pe 


where Tis the period of the steps and a linear and exponential function 
have been introduced to account for the decay in the signal (Extended 
Data Fig. 7a, b). The peak in the power spectrum was fitted to a Gaussian 
distribution (Extended Data Fig. 7c, d). 


Confocal fluorescence measurements 

An excitation laser beam with a wavelength of 638 nm and a power 
of 1.3 mW was scanned along the beads and tether at a line rate of 12 
Hz. To avoid parasitic noise from the beads, proteins were tethered 
using two 5-kbp instead of 2.5-kbp DNA handles. In addition, the 2MBP 
construct was used in order to observe larger distance changes. Force 
spectroscopy and confocal microscopy data were synchronized based 
on the movement of the beads (Extended Data Fig. Sa—d). The edge 
of the moving bead was tracked using a Gaussian fit (Extended Data 
Fig. 5b) and overlying it on top of the optical tweezers signal for the 
bead movement showed atime offset (Extended Data Fig. 5c). In order 
to determine the value of this offset, we computed the root-mean- 
square deviation (r.m.s.d.) between the signals for different time offsets 
(Extended Data Fig. 5d): 


[OK -x(”? 
r.m.s.d.(T) =, N@ 


Where tis the time offset applied to the tracked signal, N(z) is the total 
number of points, X(t) is the position of the bead according to the volt- 
age of the mirror and x(7) is the tracked position from the fluorescence 
kymograph. Minimization of r.m.s.d.(t) provides an excellent estimate 
of the time offset between the signals (Extended Data Fig. 5d). 


Integration of optical tweezers and imaging signals to compute 
the length components 

After ClpB binding and moving to a region containing only ATP, the 
fluorescent spot between the beads was fitted to a Gaussian distribu- 
tion. To reduce the noise of the signal, we averaged the intensity profiles 
of three scanning lines before fitting. The resulting trajectory yielded 
the absolute position of CIpB with subpixel precision, which was then 
converted to nanometres using a factor of 80 nm per pixel. 

Next, we computed the position of each bead edge that is closest to 
ClpB (bottom edge for top bead and vice versa) using the trap position, 
bead displacement and bead radius. Although it is possible to obtain 
these positions from the fluorescence kymograph, the optical tweezers 
data yield higher spatial resolution. We subtracted the CIpB tracked 
position from the position of bottom edge of the top bead, and we 
subtracted the position of the top edge of the bottom bead from the 
ClpB position. These distances contain an arbitrary shift owing to the 
mismatched reference system of the optical tweezers and confocal 
fluorescence images. In order to identify the offset, we used the fact 
that when the polypeptide is completely translocated (information 
present in the optical tweezers data, such as Fig. 1c or Extended Data 
Fig. 2), both distances should be equal to each other and equal to half 
the distance D between the edges of the beads. After correcting for the 
shift, we obtained the absolute distance between CIpB and each of the 
beads (D, and D,). Since we use a force clamp, any change in distance 
is solely duetoachange inthe protein length (Ax, = AD, and Ax,=ADpz, 
Extended Data Fig. Se, f). Therefore, we removed the constant DNA 


contribution and computed the protein contour length from each 
distance (L, and L,) using the WLC model. 


Peptide library data and initial ClpB binding location 

The MBP peptide library was prepared by automated spot synthesis 
by JPT Peptide Technologies (PepSpots). The library is composed of 
13-mer peptides scanning the MBP primary sequence with an overlap of 
10 residues. One micromolar ClpB-NTD (Met1-Ser148) was incubated 
for 30 min in buffer P (10 mM Tris pH 7.5, 150 mM KCI, 20 mM MgCl,, 5% 
(w/v) sucrose and 0.005% (v/v) Tween 20) with the library. Afterwards, 
buffer P was discarded and the membrane was washed with cold TBS (50 
mM Tris pH 7.6, 150 mM NaCl). Fractionated western blotting enabled 
transfer of CIpB-NTD bound to peptide spots onto PVDF membranes 
and bound CIpB-NTD was detected by use of specific, polyclonal (rab- 
bit) anti-ClpB-NTD serum. 

The obtained blot image was divided in regions and the individual 
intensities were computed (Extended Data Fig. 6d). A Gaussian filter 
was applied to the resulting distribution to account for sequence over- 
lapping and mirroring was performed for direct comparison with the 
optical tweezers data (Extended Data Fig. 6e). 


ATPase activity assay 

MBP-DM was denatured in 50 mM Tris, 25 mM KCI, 10 mM MgCl, 6M 
urea and 2 mM DTT. The ATPase activity of the different variants was 
determined in 50 mM Tris, 25 mM KCI, 10 mM MgCl,, 0.4 M urea and 2 
mM DTT in presence of 2mM ATP. ATPase measurements were started 
by addition of substrate. 


Additional statistical calculations 
Error bars of proportion histograms (Fig. 4c and Extended Data Fig. 6a) 
were calculated using the standard error of a binomial distribution: 


where pis the success proportion and Nis the total number of obser- 
vations. 

Statistical sizes for bar plots are: Fig. le: 20, 9, 52, 58, 31, 48, 77, 40, 
4, 41,14 and 5 for each point; Fig. 4c: 41, 30, 29, 14, 23, 23, 25, 19, 14, 19 
and 29 for each bar. 


Reporting summary 
Further information on research design is available in the Nature 
Research Reporting Summary linked to this paper. 


Data availability 


The data that support the findings of this study are available from the 
corresponding authors on reasonable request. 


Code availability 


All data were analysed using a custom Python package that is available 
online and can be downloaded upon request to the corresponding 
author. 
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Extended Data Fig. 1| Mechanical unfolding of substrates and extended 
length description. a,c, e, Force-extension curves showing the characteristic 
unfolding pattern: MBP (a), the 2MBP (c) and the 4MBP construct (e), withan 
initial gradual and discrete unfolding of C-terminal a-helices (Extended Data 
Fig. 8a) followed by asharp unfolding of the cores. Grey lines show WLC fits to 
the data. Red indicates pulling and blue indicates relaxing of the protein chain. 
b, d, f, The corresponding extended length, of MBP (b), the 2MBP (d) and the 


Time 


4MBP construct (f).L, reflects the contour length along the polypeptide 
backbone, but only of the unfolded part of the protein that is compliant (that is, 
unfolded and at the cis-side of ClpB).L. is determined from the measured force 
and extension (distance between beads), and using the WLC model of anon- 
interacting chain. Grey lines, contour length values obtained from the WLC fits. 
At low forces, the WLC curves of different contour lengths converge, yielding 
noisy data. 
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Extended Data Fig. 2| Translocation by CIpB variants. a, CIpB monomer 
structure indicating all tested variants. These variants (except K467C) were 
generated in the constitutively active YSO3D background. Variants E279A and 
E678A are Walker B mutants in the nucleotide-binding domains NBD1and 
NBD2, respectively. These mutations abolish ATP hydrolysis at NBD1 or NBD2. 
Variants Y251A and Y653A are pore-loop mutants in NBD1and NBD2, 
respectively. These mutations affect substrate interaction in the CIpB pore at 
either NBD1 or NBD2. The K476C variant undocks the middle domain (MD), 
mimicking the effect of Hsp70 (DnaK) activation. MD undocking in the Y503D 
variant is more pronounced, and therefore activation is more robust. An 
additional construct (CIpB(AN)) lacked the N-terminal domain (NTD), 
hindering initial substrate binding. Finally, the variant E731C harboursa 
cysteine at the bottom of NBD2 for fluorophore labelling. b, Fraction of time 
showing activity (f,) for different mutants (in YSO3D background, except 
K476C and wild type (WT)). c, Average translocation speed for all ClpB variants 
tested. KJE is the DnaK system (DnaK, DnaJ and GrpE). The median is displayed 
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as a horizontal line within the box, and the meanasa white square. Whiskers 
indicate the lowest datum still within 1.5 interquartile range (IQR) of the lower 
quartile, and the highest datum still within 1.5 IQR of the upper quartile. Sample 
sizes: n=1,139 (Y503D), n= 24 (K476C), n=7 (wild type) runs. d, Translocation 
example for CIpB(K476C). Scale bars correspond to 200 aaand10s. 

e, Translocation example for wild-type CIpB with the DnaK system (DnaK, DnaJ 
and GrpE). Scale bars correspond to 200 aaand 5s. f, g, Absolute ATPase rate (f) 
and ATPase substrate-stimulation (g) for the three ClpB variants and different 
substrate conditions (mean +s.d.). ATPase activity is higher and more strongly 
stimulated for Y503D, followed by K476C and wild type. The lower activities 
observed in the presence of denatured MBP-DM with respect to casein may 
reflect lower affinity and lower concentrations due to aggregation. The ATPase 
activity assay was repeated three times for all conditions infand g, except for 
K476C, WT + MBP, and Y503D + casein, for which it was repeated twotimes. 
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Extended Data Fig. 3| Translocationruns for different constructs and 
molecules. Traces of protein extended length contractions inthe presence of 
ClpB(Y503D) and ATP. a, MBP (L,=360 aa). b, 2MBP (L,=720 aa). c, 4MBP 
(L.=1440 aa). d, Speed distribution of translocation runs for the three different 
constructs (number of runs: n=213 (MBP), n=287 (2MBP), n=306(4MBP)). All 
showasimilar range of speeds, as expected, with one main peak (at 
v=240aas')andasecond peak or shoulder at twice the magnitude (2v).A 
slight change in the ratio is observed between the two peak heights, with 2v 
becoming more pronounced in the longer constructs. This difference could 
reflect that distances between the initial ClpB binding site and the arresting 
DNA handles is then larger, and hence double-arm translocation more likely 
(see also Extended Data Fig. 4). e, Translocation speed distributions from three 
different substrate molecules (number of runs: n, = 218 (purple), n,=102 
(orange) and n,=114 (green)), which show no significant variability between 
individual substrates. f, Translocation speed distributions for three different 
translocation bursts, which show continuous run-slip-run activity, and are 
thus surmised to reflect the action of individual ClpB hexamers (number runs: 
n,=25 (purple), n,=26 (orange) and n,=49 (green)). Distributions are for 


Run duration (s) ” 


ClpB(Y503D) and ATP, at approximately 8 pN. The data indicate no significant 
variability in the translocation activity between CIpB hexamers. The burst 
duration varied between 5 and 80s, whereas the time between bursts varied 
between5and 150s, for the 2MBP construct. g, Example translocation run of 
MBP showing the definitions of run length and run duration. Run duration is 
calculated as the time from the start of arun until the next back-slipping event, 
including the pause after translocation and before the next back-slip. Run 
length is calculated as the length difference between the start of arunand the 
next back-slipping event. h, i, Run length (h) and run duration (i) (see g) 
distribution for constructs of different lengths. Notably, the run duration 
distributions are similar for the constructs of different length, which suggests 
that the moment CIpB loses grip on one of the arms and causes the back-slip is 
determined by events that are intrinsic to the CIpB hexamer, and donot depend 
onthe substrate nor the encounter with blockades (suchas the DNA tether). 
This would make functional sense in the physiological context, as CIpB can then 
continue to pushin an attempt to disrupt aggregated structures. By contrast, 
the switch between double- and single-arm translocation is directly triggered 
by such blockades, though without losing grip on either of the two arms. 
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Extended Data Fig. 4 | Speed characterization of translocation runs. 

a, Translocated length (L,) during threading of 2MBP by ClpB(Y503D). Raw data 
(light grey, 500 Hz) is filtered using a Savitzky—Golay filter (black line). b, Local 
translocation speed calculated as the time derivative of the translocated 
length after Savitzky-Golay filtering. Negative slopes below-50nms7? 
(horizontal line) are considered back-slipping events (orange areas, also ina) 
and help in determining isolated translocation runs. c, Identification of 
different speeds within a single translocation run. Linear fits are used to 
calculate the speed of the run (green and magenta lines), most times revealing 
two main velocities, one double that of the other. d, Time derivative of the 
filtered translocated length for a single run, with solid black lines indicating the 
threshold speeds to distinguish no translocation from single- and double- 
translocation velocities and green and magenta indicating the fitted velocity 
values (also showninc). 
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Extended Data Fig. 5 | Integrated tweezers and fluorescence particle 
tracking method. a-d, Synchronization of fluorescence and tweezers signals. 
a, Confocal scanning kymograph of two trapped beads. b, Intensity profile ofa 
scanning line (blue ina), with a Gaussian fit of the edge of the moving bead 
(bottom) in blue. c, Offset between the fluorescence detection of bead 
movementas shown in b (blue dots), and high-resolution tweezers signal of 
trap and bead movement (black line) signals. d, Root mean square deviation 
(r.m.s.d.) between the signals for different time shifts t. The minimum is 
marked witha triangle and represents the best estimation of the offset 
between the signals. e, f, Force clamp and computation of the twolength 
components. e, Scheme of the lengths involved. D, and Dx, distances between 
beads and CIpB; x, and xp, distances between protein termini and CIpB. Note 
that these distances are not contour lengths. f, Bead and CIpB position changes 
for left-arm (left) and right-arm (right) translocation. g, Kkymograph underlying 
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data in Fig. 2i.h, Corresponding tracked position of CIpB. Horizontal lines 
indicate extreme CIpB positions. Top line, CIpB is positioned at the left-hand 
terminus (see eand f). Bottom line, CIpB at the right-hand terminus; no 
polypeptide is translocated (the complete polypeptide is thus on the cis side of 
ClpB). Deviations from the top line consistently occur at back-slip moments 
detected by the tweezers (j; see the two shorter back-slips), which shows that 
the left arm (red) back-slips. Some back-slips detected by the tweezers do not 
show acorresponding ClpB movement, whichis expected because right-arm 
back-slips should not change the CIpB position. i, Corresponding lengths of left 
arm (red) and right arm (blue) against time, as determined from fluorescence 
tracking (g, h) and tweezers (j) data. j, Corresponding tweezers data showing 
the distance between termini (contour length of cis segments). k, Distribution 
of the different translocation and back-slipping events observed inthe 
fluorescence experiments (number of events n=127,5 molecules). 
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Extended Data Fig. 6 | Initial CIpB binding site estimation. a, Fraction of runs 
showing double speed when considering all runs (n=1,704) and first runs only 
(n=30). Data are mean + standard error of a binomial distribution 

(see Methods). b, Example of first translocation run. CIpB binds at acertain 
location on the polypeptide, starts translocating both strands yielding the 
double speed (green) until it encounters the closest terminus, when it switches 
to single strand translocation (red). At the switch, the length translocated thus 
equals the distance between the initial binding site and the closest terminus 
(L,), but times two because CIpB also translocated the other arm. Afterwards, 


the second terminus is also reached, and translocation stalls and aback-slip 
occurs, although this is not relevant here. c, Kernel density estimation (KDE) 
distribution of the inferred binding locations based on first runs, as described 
in b (n=30). The distribution is symmetric because N and C termini cannot be 
distinguished. d, Peptide library data indicating regions of MBP that are bound 
by ClpB(NTD).e, Spot intensities were quantified using acustom script in 
Python. For direct comparison withc, the spot intensity distribution was also 
mirrored. 
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Extended Data Fig. 7 | Single translocation steps by CIpB. a—d, Analysis of of ATP-ATPyS mixture (1 mM each). Longer pauses are observed during 
step periodicity, related to Fig. 3. a, b, Autocorrelation of the pairwise length translocation because AT PYS is hydrolysed much more slowly than ATP, and 
distribution for single-speed (a) and double-speed (b) runs from Fig. 3 (black therefore can result in stalling. The prolonged stalling seen here is inline witha 
dots). The red line isa fit, yielding period values of 14 and 28 aa, respectively. sequential ATP-hydrolysis along CIpB subunits. g-i, Notably, in these 
c,d, Power spectrum analysis of the pairwise length distribution for a(c) and conditions, step-sizes smaller than 14 aa are now observed. These findings 
for b (d), showing a peak that fitted to a Gaussian distribution (red) yieldsO.071 —_ provide further support for the 14-aa steps being produced by the rapid 
and 0.037 aa‘, respectively. This translates to 14 and 27 aa steps, in excellent consecutive action of multiple or all 6 ClpB subunits, whose individual 2-aa sub- 
agreement with the values obtained from the autocorrelation. e, The average steps would remain unresolved. After starting, ahydrolysis sequence moving 
step size is 14.6 + 0.9 aa for single-speed translocation and 29 + 3 aa for double- along the CIpB hexamer would then arrest prematurely when encountering a 
speed translocation (mean +s.e.m.,n,=8andn,=4,4 molecules), and ATPyS-bound subunit, and hence yield asmaller step size. 
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Extended Data Fig. 8 | Trans-refolding does not occur in single MBP anda 
mutant 2MBP construct. a, Structure of MBP (PDBID: 2MVO), showing the 
C-terminal helices (red; around 90 residues) not required for core folding”® 
(blue). b, Cartoon representation of the extended MBP chain showing the C 
terminal domain in red. After translocation arrest at the termini, segments at 
the N- and Ctermini (approximately 20 aa each) remain stuck inside the CIpB 
pore, and are thus not available for folding. Whereas the C-terminal segment 
(red) is not crucial for core formation, the N-terminal segment (blue) is. Thus, 
trans-refolding of single-MBP is not expected and indeed not observed. 

c, Cartoon representation of the extended 2MBP. The second MBP core 
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(blue, 2) can fold in trans, since it now can translocate fully. d, Translocation 
run-and-slip activity for atandem repeat of double mutant MBP (2MBP(DM)), 
whichis compromised in refolding. Grey line indicates 720 aa, red line 
corresponds to O aaand the orange line corresponds to 310 aa, the length of 
one MBP core plus the two approximately 20-aa segments inside the pore. 
Back-slipping arrests at the orange line, as seen for 2MBP (Fig. 4), areno longer 
observed. e, Corresponding length distribution. Upon slipping, the released 
length (L,) isnow typically equal to the previously translocated length (blue 
data follows red line, n=203 runs, 6 molecules). 
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Extended Data Fig. 9 | Disruption of folded and aggregated structures by 
CIpB.a, Extension length (L.) of the 4MBP construct plotted against time inthe 
presence of ClpB(Y503D) and ATP. b, Cartoons of event sequence suggested in 
a. (1) One MBP core is unfolded by increasing the force, immediately followed 
by relaxation to 5 pN to avoid unfolding other MBP cores. Some C-terminal 
helices also unfolded in this process. (2) After a waiting period, CIpB binds the 
unfolded part and translocates it completely. (3) ClpB reaches the 
neighbouring folded MBP domain (and the DNA tether), and hence no longer 
changes L.. (4) After ashort pause, L, increases ina discrete step of 270 aa, 
indicating the unfolding of one MBP core, which has precisely that length. 

(5) ClpB(Y503D) translocates briefly immediately afterwards, further 
indicating the bound CIpB, and (6) back-slipping occurs. Note that the length of 
the unfolded chain has increased by 270 aa, the length of one MBP core, as 
expected (star). (7) Translocation continues. c,d, To create a misfolded or 
aggregated state, the 4MBP construct was unfolded and rapidly relaxed (green 
trace). This sometimes produced non-native structures characterized by being 
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compact and highly resistant to force (red trace). The tether was then relaxed 
tolow force. e, Subsequent measurement of extension length against time. 

f, Cartoons of event sequence suggested ine: (1) the length remains 
unchanged, for example, owing to waiting for CIpB binding. (2) The length 
increases abruptly by about 600 aa, whichis more than one MBP core (270 aa), 
suggesting that ClpB disrupted anon-native (aggregated) structure that 
contained more than one MBP repeat. (3) CIpB translocation is observed 
immediately afterwards. This is consistent with the model, because one-step 
disruption of structures by CIpB (pushing) action can yield unfolded 
polypeptide segments directly onthe cis side of ClpB that are then available for 
translocation. Note that polypeptide may also be liberated on the other side of 
the misfolded structure, which is not immediately available for translocation. 
Subsequently, further translocation and slipping behaviour is observed. Note 
that the structure becomes almost fully disrupted, as it nears the maximum 
length of 1,440 aa. 
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Extended Data Fig. 10 | Loop extrusion asa disaggregation principle. probably required, involving multiple Hsp100 hexamers and other chaperones 
Insertion and translocation of loops promotes efficient disaggregation, suchas Hsp70, acting at different moments in time and at different locations 
because aggregates may display few accessible polypeptide termini at the within the aggregate. The random non-processive action of Hsp70s probably 
surface. Translocation by Hsp100s of polypeptides entangled inaggregates inherently requires multiple Hsp70s working together, ina manner that does 
generates pulling forces that promote their dissociation, cooperative not generate large pulling forces, while exploiting rapid binding and 
disruption of larger structures, and extraction. The ability of Hspl0Ostoswitch —unbinding. Incontrast, the processive nature of CIpB translocation enables 
between translocation modes is relevant to prevent pore jamming when fast, deterministic, and forced dissociation, which further limits re- 
encountering structures that resist immediate disruption. To dissolve such aggregation and degradation when in combination with rapid refolding. 
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Statistical parameters 


When statistical analyses are reported, confirm that the following items are present in the relevant location (e.g. figure legend, table legend, main 
text, or Methods section). 


n/a | Confirmed 


The exact sample size (n) for each experimental group/condition, given as a discrete number and unit of measurement 


An indication of whether measurements were taken from distinct samples or whether the same sample was measured repeatedly 


The statistical test(s) used AND whether they are one- or two-sided 
Only common tests should be described solely by name; describe more complex techniques in the Methods section. 


A description of all covariates tested 


A description of any assumptions or corrections, such as tests of normality and adjustment for multiple comparisons 


A full description of the statistics including central tendency (e.g. means) or other basic estimates (e.g. regression coefficient) AND 
variation (e.g. standard deviation) or associated estimates of uncertainty (e.g. confidence intervals) 


For null hypothesis testing, the test statistic (e.g. F, t, r) with confidence intervals, effect sizes, degrees of freedom and P value noted 
Give P values as exact values whenever suitable. 


[| For Bayesian analysis, information on the choice of priors and Markov chain Monte Carlo settings 


For hierarchical and complex designs, identification of the appropriate level for tests and full reporting of outcomes 


Estimates of effect sizes (e.g. Cohen's d, Pearson's r), indicating how they were calculated 


Clearly defined error bars 
State explicitly what error bars represent (e.g. SD, SE, Cl) 
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Software and code 


Policy information about availability of computer code 


Data collection Data from the Lumicks C-trap set up was acquired using the software provided by Lumicks: Tweez-O-Matic version 36.0, running under 
LabView 11.0. Fluorescence kymographs were acquired with Lumicks software Scanary 3.4 


Data analysis All analysis was performed using a custom made package in Python 3.5, available online upon request. 


For manuscripts utilizing custom algorithms or software that are central to the research but not yet described in published literature, software must be made available to editors/reviewers 
upon request. We strongly encourage code deposition in a community repository (e.g. GitHub). See the Nature Research guidelines for submitting code & software for further information. 
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All manuscripts must include a data availability statement. This statement should provide the following information, where applicable: 


- Accession codes, unique identifiers, or web links for publicly available datasets 
- A list of figures that have associated raw data 
- A description of any restrictions on data availability 


The data that support the findings of this study are available from the corresponding authors upon reasonable request. 
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Life sciences study design 


All studies must disclose on these points even when the disclosure is negative. 


Sample size No statistical methods were used to predetermine sample size. Sample sizes were chosen based on previous experience and published studies 
to assess reproducibility. Experiments were repeated multiple times on multiple substrate molecules, which were sufficient to obtain the 
described error margins. 
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Data exclusions Translocation runs whose linear fits yielded r values below 0.8 were not used for translocation speed determination unless otherwise stated. 
This criteria was established after observing that some data contained more noise and hence were not suitable for such analysis. 


Replication All experiments were replicated multiple times, using different bead pairs and substrate molecules. All attempts at replication were successful 


Randomization Samples were not randomized in the experiments. Randomization was not applicable as samples were allocated according to different 
conditions such as buffer conditions. 


Blinding Experiments were not blinded as the data acquisition and analysis were done in different conditions. 


Reporting for specific materials, systems and methods 


Materials & experimental systems Methods 

n/a | Involved in the study n/a | Involved in the study 
Unique biological materials ChIP-seq 
Antibodies Flow cytometry 
Eukaryotic cell lines MRI-based neuroimaging 
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Antibodies 


Antibodies used Roche Diagnostics Germany, Anti-digoxigenin bodies, Cat. No. 11333089001, polyclonal antibody from sheep 


Validation The polyclonal antibody from sheep is specific to digoxigenin and digoxin and shows no cross-reactivity with other steroids, such 
as human estrogens and androgens . 
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Elucidating the mechanism of sugar import requires a molecular understanding of 


how transporters couple sugar binding and gating events. Whereas mammalian 
glucose transporters (GLUTs) are specialists!, the hexose transporter from the malaria 
parasite Plasmodium falciparum PfHT1”” has acquired the ability to transport both 
glucose and fructose sugars as efficiently as the dedicated glucose (GLUT3) and 
fructose (GLUTS) transporters. Here, to establish the molecular basis of sugar 
promiscuity in malaria parasites, we determined the crystal structure of PfHT1in 
complex with D-glucose at a resolution of 3.6 A. We found that the sugar-binding site 
in PfHT1is very similar to those of the distantly related GLUT3 and GLUTS structures*». 
Nevertheless, engineered PfHT1 mutations made to match GLUT sugar-binding sites 
did not shift sugar preferences. The extracellular substrate-gating helix TM7bin 
PfHT1 was positioned ina fully occluded conformation, providing a unique glimpse 
into how sugar binding and gating are coupled. We determined that polar contacts 
between TM7b and TM1 (located about 15 A from D-glucose) are just as critical for 
transport as the residues that directly coordinate D-glucose, which demonstrates a 
strong allosteric coupling between sugar binding and gating. We conclude that PfHT1 
has achieved substrate promiscuity not by modifying its sugar-binding site, but 
instead by evolving substrate-gating dynamics. 


P. falciparum relies on a continuous supply of host-derived glucose 
during the clinically important asexual stages of growth and replica- 
tion within erythrocytes®. As a consequence, glucose consumption is 
increased about 100-fold in erythrocytes that contain the parasite’. 
This metabolism is further dependent upon the import of glucose 
across the cell membrane of the parasite by the hexose transporter 
PfHT1? (Fig. 1a). Owing to its essential role in glucose metabolism, 
PfHT1 is a well-recognized target for antimalarial drugs®". PfHT1 
belongs to the major facilitator superfamily (MFS), the members of 
which share a fold that consists of two symmetrical six transmem- 
brane (TM) bundles of helices’”?—as was first clearly observed in 
the structure of lactose permease (LacY)"*. However, PfHT1 clusters 
witha separate MFS class than that of LacY; it belongs to the subfam- 
ily of sugar porters®”°, which includes the medically relevant GLUT 
transporters!. In contrast to the GLUT transporters (which show poor 
catalytic activity for diverse sugars’), PFHT1 shows a broader substrate 
specificity”””. In particular, PFHT1 has acquired the ability to transport 
both D-glucose and D-fructose with kinetics (K,,) similar to those of 
the dedicated high-affinity D-glucose (GLUT3) and D-fructose (GLUTS) 
transporters, respectively’ °. Structures of the related sugar porters 
GLUTL'’, GLUT3* and GLUTS®, as well as the Escherichia coli xylose 
transporter XylE”!, have previously been determined””’. Because 
PfHT1 shares only low sequence identity with these transporters 
(Extended Data Fig. 1a, b), it has been unclear whether sugar recog- 
nition would be similar. Here we aimed to determine the structure 


of PfHT1 to establish the molecular ‘rules’ that govern its substrate 
specificity and inhibition. 

Purified PfHT1 was reconstituted into liposomes and showed robust 
uptake of radiolabelled D-glucose, D-mannose, D-galactose, D-fructose 
and D-xylose, consistent with in vivo analysis? (Fig. 1b, Extended Data 
Fig. 2a—e). PfHT1 can also transport D-glucosamine (Extended Data 
Fig. 2f), as has previously been observed for GLUT2 and the galactose 
transporter GalP**”>. PfHT1 kinetics for D-glucose (Ky of 0.80 mM) and 
D-fructose (Ky, of 9.6 mM) in proteoliposomes was comparable to in vivo 
estimates for PfHT1, GLUT3 and GLUTS!*©” (Extended Data Fig. 3a, d). 
The turnover rates (K,,,) for D-glucose (19 s) and D-fructose (30 s‘) were 
further comparable to in vitro estimates for those of GLUT3 (13s) 
and GLUTS (43 s”), respectively (Extended Data Table 1). We co-crys- 
tallized PfHT1 with D-glucose using the vapour-diffusion method, and 
determined the structure by molecular replacement at a resolution 
of around 3.6 A (Fig. 1c, Extended Data Table 2). PfHT1 crystallized as 
a dimer, with four molecules in the asymmetric unit (Extended Data 
Fig. 4a). The PfHT1 structure is highly similar to the outward-occluded 
structure of human GLUT3‘ (Extended Data Fig. 4b). The extracellu- 
lar half-helix TM7b in PfHT1 was found to be more occluded than in 
human GLUT3, and matched the position of TM7b in the inward-open 
conformation of GLUT1 and GLUTS (Fig. 1d, Extended Data Fig. 4d). 
Nonetheless, PfHT1 was not in an inward-facing state as access to the 
inside was closed. We conclude PfHT1 has been captured ina previously 
unobserved, fully occluded conformation (Fig. Ic, e). 


‘Department of Biochemistry and Biophysics, Stockholm University, Stockholm, Sweden. 7Department of Applied Physics, Science for Life Laboratory, KTH Royal Institute of Technology, 
Stockholm, Sweden. “These authors contributed equally: Abdul Aziz Qureshi, Albert Suades. *e-mail: ddrew@dbb.su.se 
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Fig. 1| PfHT1in D-glucose-bound occluded conformation. a, P. falciparum 
infected erythrocyte illustrating the localization of PfHT1 within the parasitic 
plasma membrane and the GLUT-dependent uptake of glucose and fructose. 
GLUT1/5, GLUT1 or GLUTS; NPP, new permeability pathway; RBC, red blood cell; 
PVM, parasitophorous vacuole membrane; PPM, parasite plasma membrane. 
b, Time-dependent uptake of [*C]p-glucose (black circles) by PFHT1 
proteoliposomes. Inset, PfHT1[*C]D-glucose (cyan trace) and non-specific 
uptakein empty liposomes (black trace). Error bars represent mean +s.e.m. of 
n=3 biologically independent experiments. D-Glu, D-glucose. c, Cartoon 
representation of the structure of the PfHT1 D-glucose-bound complex in the 
occluded conformation, showing the N-terminal six-transmembrane (6TM) 
domain (NTD) (blue), the C-terminal 6TM domain (CTD) (magenta), the 
intrahelical domain (ICH) (grey) and D-glucose (stick representation). PfHT1 
has a large intracellular salt-bridge network that stabilizes the occluded 
conformation, as seen in structures of related sugar porters in outward-facing 
conformations” (Extended Data Fig. 4c). The structures of sugar portersin 
the outward-facing conformation are further stabilized by interactions 
between the intracellular helices ICH1, ICH2, ICH3 and ICH4, which collectively 
interact with ICHS5 to latch the NTD and CTD together*. ICHS has previously 
been observed only in the outward-facing conformation*». In the occluded 
PfHT1 structure, we were unable to model ICHS (Extended Data Fig. 4e), which 
implies that PfHT1is primed for transition into the inward-facing 
conformation. Ext, exterior; int, interior. d, Ribbon representation of the CTD 
domain of PfHT1 (magenta), superimposed with TM7 and TM10 gating helices 
of outward-facing GLUT3 (RCSB Protein Data Bank code (PDB) 4ZWC, shownin 
teal) and inward-facing GLUTS (PDB 4YB9, shown in light orange) e, Surface 
representation of the PfHT1 structure in the occluded conformation with 
D-glucose shownas sticks (left), and the corresponding polder omit map (blue 
mesh at 50) shown for the D-glucose surrounded by the NTD and CTD helices 
(right), coloured asinc. 


We observed considerable non-protein electron density in the 
C-terminal bundle of PfHT1 that corresponded exactly to the sugar- 
binding site for D-glucose in human GLUT3* (Figs. le, 2a, b, Extended 
Data Fig. 4e). Almost all D-glucose hydrogen-bonding residues in human 
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Fig. 2| Molecular recognition of D-glucose by PfHT1.a, Cartoon 
representation of the PfHT1 sugar-binding site with D-glucose (yellow sticks) 
and the interacting residues labelled; residues Q169, Q305, Q306, N311and 
N341 were determined to be essential for sugar transport, as shownind. 
Putative hydrogen bonds are indicated with dotted lines. b, Sugar-binding site 
comparison between PfHT1 (yellow sticks) and human GLUT3 (PDB 4ZW9) 
(grey, conserved; cyan, non-conserved). The crystallization lipid monoolein 
(MO) interacting with TM7b in human GLUT3 is shown in cyan, and the PfHT1 
A404 residue that corresponds to E378 in human GLUT3 is shown in dark blue. 
Putative hydrogen bonds are indicated with dotted lines. c, The competitive 
uptake of [*C]p-glucose by PfHT1 proteoliposomes in the absence (white bar) 
and presence of non-labelled D-glucose epimers and homologues (black bars); 
note, competitive uptake cannot distinguish between transported and non- 
transported sugars and non-specific uptake in empty liposomes (red bar). Data 
are mean¢+s.e.m. of n=3 biologically independent experiments. d, Transport 
activity of PFHT1 mutants for residues in D-glucose-binding site, for [“C] 
D-glucose (black bars) and [“C]p-fructose (white bars). Dataare mean +s.e.m. 
of n=3 biologically independent experiments. e, Comparison of peripheral 
D-glucose-binding site between PfHT1 (yellow sticks) and human GLUT3 (grey, 
conserved; cyan, non-conserved), and monoolein lipid interacting with human 
GLUT3 in cyan. f, Transport activity of PfHT1 mutants for residues inthe 
peripheral binding site (in which the residue is substituted with alanine or the 
equivalent residue in human GLUT3) for [“C]D-glucose (black bars) and [“C] 
D-fructose (white bars). Data are mean +s.e.m. of n=3 biologically independent 
experiments. 


GLUT3 were conserved and similarly positioned in PfHT1 (Fig. 2a, b, 
Extended Data Fig. 1c). D-Glucose in PfHT1 was therefore modelled with 
the observed orientation in human GLUT3 and refined with appropriate 
stereochemical restraints (Methods). It was, nonetheless, important to 
validate the binding pose of D-glucose, especially since a crystallization 
lipid interacted with D-glucose in human GLUT3 (Fig. 2b). Forthwith, 
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Fig. 3 | Gating helix in PfHT1 enables substrate promiscuity. a, Cartoon 
representation of the sugar-binding site in occluded PfHT1 (green sticks) 
superimposed with outward-open GLUTS (PDB 4YBQ, blue sticks), outward- 
occluded GLUT3 (PDB 4ZW39, grey sticks) and inward-open GLUTS (PDB 4YB9 
orange sticks). Asn311 (dotted ellipsoid) is the only residue that clearly 
repositions during the entire transport cycle. b, TM7b sequence alignment 
between human (h) GLUT1, GLUT2, GLUT3, GLUT4 and GLUTS, rat (r) GLUTS, 

E. coli XylE and PfHT1. The red box highlights that the highly conserved 
occlusion-forming tyrosine residues” are replaced by serine and asparagine in 
PfHT1. The yellow shading highlights conserved residues. c, Cartoon 
representation of PfHT1extracellular gating interactions between TM7b 
(magenta) and TMI (blue). Potential hydrogen-bond interactions are indicated 
by dotted lines and prominent residue side chains are labelled. d, Transport 
activity for TM1-TM7b interacting-residue mutants for [*C]D-glucose (black 
bars) and [“C]p-fructose (white bars). Residues K51, N311, N316, $317 and N318 
were determined to be essential for sugar transport; their relative positions are 
showninc. Dataare mean +s.e.m. of n=3 biologically independent 
experiments. e, Gating interactions for the D-glucose-bound outward- 
occluded structure of human GLUT3 (grey) (top) and the D-glucose-bound 
occluded structure of PfHT1 (bottom). Blue, NTD; magenta, CTD; shaded lines 
represent the distribution of gating distances from n=3 independent 1-ps 
molecular dynamics simulations (Methods) and non-shaded lines represent 
their respective mean distance. Representative side views for human GLUT3 
and PfHT1at the start and end of the simulation time are also shown. 


[*C]p-glucose uptake by PfHT1 proteoliposomes was performed inthe 
presence of unlabelled epimers of D-glucose and homologous sugars 
(Fig. 2c, Extended Data Fig. 5a, b). Most revealingly, D-gulose—which 
differs from D-galactose in only the C3-hydroxyl orientation—displayed 
fivefold-poorer competition for [“C]D-glucose uptake than D-galactose. 
Consistently, D-allose (the C3 epimer of D-glucose) was clearly weaker 
at competing for ['*C]D-glucose uptake than either the C4 epimer 
D-galactose or the C2,C4 epimer D-talose. Moreover, D-mannose and 


2-deoxyglucose, which differ from D-glucose in their C2 positions, were 
just as competitive for ["*C]p-glucose uptake as D-glucose. Overall, 
the C3-hydroxyl group orientation was determined to be the most 
critical for D-glucose recognition, followed by the C4-hydroxy group 
orientation. 

In the PfHT1 structure, the C3- and C4-hydroxyl groups hydrogen 
bond to Asn311 and GIn306, whereas the C1- and C2-hydroxyl groups 
hydrogen bond to GIn169, GIn305 and Trp412 (Fig. 2a). Consistently, 
whereas alanine substitutions of Trp412 and GIn305 residues retained 
100% and 22% of wild-type activity (respectively), GiIn306 and Asn311 
alanine mutations were less than 10% active (Fig. 2d, Extended Data 
Fig. 6a). In the observed orientation, the Cl1- and C2-hydroxyl groups 
face the interior and the C3- and C4-hydroxyl groups face the exte- 
rior, which is consistent with the binding pose that is biochemically 
predicted for GLUT1”. Because F. coli XylE also binds D-glucose ina 
manner similar to that of PPHT1 and GLUT3”, the D-glucose binding 
mode appears to be evolutionarily conserved (Extended Data Fig. 6d). 
Notably, the strictly conserved GIn169 (in TMS) is the only N-terminal- 
bundle residue that coordinates D-glucose and is—as expected—also 
critical for transport (Fig. 2d). 

To investigate PfHT1 promiscuity, we focused on the ability of PfHT1 
mutants to transport D-glucose versus D-fructose, which are the most 
physiologically relevant sugars and have sevenfold differences in spec- 
ificity constants (K,,,/Ky) (Extended Data Fig. 3f, g). We substituted 
Ala4.04 with glutamate, as this residue was the only obvious difference 
between the human GLUT3 and PfHTI1 sugar-binding sites (Fig. 2b). 
However, the Ala404Glu mutant retained D-glucose and D-fructose 
transport (Fig. 2d, Extended Data Fig. 6a, e, f). We next investigated 
whether residues peripheral to the main sugar-binding site might 
influence substrate selectivity (Fig. 2e, f). Consequently, we gener- 
ated His168Asn, Val314Phe and Ala439Asn single mutants of PfHT1to 
mimic human GLUT3; in these mutants, transport of both D-glucose 
and D-fructose were again similarly impaired (Fig. 2f, Extended Data 
Fig. 6b). Asa comparison, we also assayed alanine substitutions of the 
conserved residues Ile310, Phe403, Asn435 and Trp436; these substitu- 
tions also led to impaired transport of both sugars—with the exception 
of Asn435Ala, which selectively abolished D-fructose transport (Fig. 2f, 
Extended Data Fig. 6b). 

As we were unable to rationalize the sugar preferences of PfHT1 on 
the basis of human GLUT3, we extended our comparison to rat GLUTS, 
which revealed that the Trp412 in PfHT1 is replaced by alanine in GLUTS 
(Fig. 2b, Extended Data Fig. 6g). However, the Trp412Ala mutant 
retained D-glucose transport but had severely reduced D-fructose trans- 
port (Fig. 2d, Extended Data Fig. 6h). Taken together, our experiments 
suggest that D-fructose transport in PfHT1 requires almost the same 
set of sugar-binding residues as does D-glucose transport, but that the 
former shows greater sensitivity to mutagenesis—probably because 
it interacts with lower affinity*”. Indeed, the Asn435 and Trp412 ala- 
nine mutations that selectively affect D-fructose transport (Fig. 2d, f) 
nevertheless have reduced D-glucose turnover and are not found in 
transporters that are specific for D-fructose (Extended Data Table 1, 
Extended Data Figs. 1c, 6a, b). It was thus unclear how PfHT1 robustly 
transports different sugars. 

Despite the high levels of structural similarity, two antimalarial com- 
pounds C3361 and MMV009085 have previously been discovered to 
have 19- to 250-fold higher selectivity for PPHT1 over human GLUTI, 
GLUTS, and over GLUT1-4, respectively®"*°. The compound C336lisa 
D-glucose derivative with an undec-10-en- addition at the C3-hydroxyl 
position; aliphatic chain additions to the C3 hydroxyl showed the 
strongest inhibition (followed by additions at the C4 hydroxyl), 
whereas additions to Cl, C5 or C6 positions showed no inhibition’. In 
the occluded PfHTI1 structure, there is anarrow hydrophobic vestibule 
that would be accessible only from the C3- and C4-hydroxyl positions 
(Extended Data Fig. 7a). Inthe glucose-bound human GLUT3 structure, 
acrystallization lipid consistently occupies this site (Extended Data 
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Fig. 7a). Although the binding mode of MMV009085 was previously 
unknown, we conclude that it also binds in the sugar-binding pocket 
because—similar to cytochalasin B—PfHT1 inhibition was found to 
be dependent on Trp412 (Extended Data Fig. 7b). The malarial box 
inhibitor MMV009085 is asymmetric tetracyclic compound withtwo 
butanol moieties (Extended Data Fig. 7c). Consistent with inhibition 
requiring interaction with the hydrophobic-side vestibule, acompound 
synthesized without the butanol moieties was unable to inhibit PfHT1 
(Extended Data Fig. 7c). Thus, the occluded structure highlights how 
an off-site vestibule could be targeted to improve selective inhibition 
of PfHT1. 

By combining the occluded PfHT1 structure with previous sugar- 
porter structures, we can reconstruct what is arguably the most com- 
plete MFS transporter cycle known to date (Supplementary Video 1). 
During the rocker-switch alternating-access mechanism”, the N- and 
C-terminal bundles clearly rearrange around the centrally located 
substrate-binding site. Global rearrangements are further coupled with 
local rearrangements of the TM7b and TM10a half-helices that gate 
access to the sugar-binding site from the outside and inside, respec- 
tively*>””. Side-chain positioning of almost all sugar-binding residues 
are virtually unchanged during the entire transport cycle (Fig. 3a), 
whichimplies sugar translocation must be primarily driven by confor- 
mational selection. The residue Asn311 in the extracellular substrate- 
gating helix TM7b is the only residue that moves substantially during 
the transport cycle. Specifically, in the transition to a sugar-bound 
occluded state, Asn311 moves inward to form hydrogen bonds with 
the critically important C3- and C4-hydroxyl groups (Fig. 3a, Extended 
Data Fig. 8a). In human GLUT3, two highly conserved TM7b tyrosine 
residues move in concert with Asn311 to occlude sugar exit* (Extended 
Data Fig. 8a). Thus, the strictly conserved Asn311is probably a generic 
interaction site that couples sugar binding to TM7b gating. 

The occlusion-forming tyrosine residues that are strictly conserved 
inthe GLUT proteins are replaced in PfHT1 by the polar residues Ser315 
and Asn316, and the substitution of either with tyrosine abolished trans- 
port (Fig. 3b, d, Extended Data Fig. 6c). Asn316 extended towards TM1 
and seemed to form polar interactions with Lys51 (Fig. 3c). The muta- 
tion of LysS1, which is located about 15 A from D-glucose, to alanine or 
glutamine also rendered PfHT1 non-functional (Fig. 3d, Extended Data 
Fig. 6c). TM7b and TM1 interactions were further observed between the 
backbone oxygen of Asn48 in TM1 and Asn318 in TM7b, which formed 
hydrogen bonds with Ser317 (Fig. 3c, Extended Data Fig. 6c). Ser317Ala 
and Asn318Ala (which affect TM7b) mutants also rendered PfHT1non- 
functional, whereas the Asn48Ala mutant had severely reduced activity 
(Fig. 3d, Extended Data Fig. 6c). By contrast, alanine mutations of Ser315 
and Glu319 residues (which point away from the TM1-TM7b interface) 
retained robust transport activity (Fig. 3c, d). The functional Ser315Ala 
and Glu319Ala mutants nevertheless show a reduction in turnover, 
mostly for D-glucose (Extended Data Table 1). To probe TM1 and TM7b 
interactions further, we compared 1-1s molecular dynamics simulations 
of human GLUT3 and PfHTI structures. In the presence or absence of 
D-glucose, TM7b in human GLUT3 was found to be very mobile and it 
consistently moved far enough apart from TMI1 to enable the release of 
D-glucose (Fig. 3e, Extended Data Fig. 8b). By contrast, TM7b in PfHT1 
mostly retained an occluded conformation (Fig. 3e, Extended Data 
Fig. 8c) but was somewhat more mobile in the absence of D-glucose. 
For the majority of the simulation time, TM1 and TM7b contacts were 
maintained between the Lys51 and Asn316 residues (Extended Data 
Fig. 8d). Taken together, our findings show that TM1 and TM7b gating 
interactions and dynamics appear to be of equal importance to sugar 
transport kinetics as the residues in the sugar-binding site. 

The formation of the occluded state is an important intermediate 
for understanding substrate coupling. A statistical comparison of the 
PfHT1 structure supports its designation as an occluded conformation 
that links previously determined sugar-porter states (Fig. 4, Extended 
Data Fig. 9a—c). For a sugar to bea substrate, it not only has to bind but 
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Fig. 4 | The conformational-selection-driven rocker-switch mechanism for 
facilitative sugar transport. The fully occluded conformation of PfHT1is the 
last remaining state to be observed within the rocker-switch alternating-access 
mechanism of MFS transporters that belong to the sugar-porter subfamily. The 
observed structural states are shown as surface transversal cross-sections and 
clockwise from the top left: outward-open rat GLUT5 (PDB 4YBQ), outward- 
occluded human GLUT3 (PDB 4ZW9), fully occluded PfHT1, inward-occluded 
XylE (PDB 4JA3) and inward-open bovine GLUTS (PDB 4YB39). In the forward or 
reverse direction, the attainment of the occluded intermediate represented by 
PfHT1is required. The principal component (PC) analysis from the conserved 
MFS ensemble core (n=17 structures from 16 PDB codes; Methods) is shown 
below the structures; this analysis yields a major PC1component (65% of the 
total structural variance) that tracks the rocker-switch global motion. 
Projections are coloured according to the angle between tandem repeats (TR) 
(Methods). 


must also induce formation of the occluded state, whichis a prerequi- 
site for alternating access. Inthe transition from the outward-occluded 
tothe fully occluded conformation, TM7b breaks and extends closer to 
TMI, adopting the position seen in inward-facing structures (Extended 
Data Figs. 4d, 8a). The fact that the occluded state can be observed by 
crystallography implies this state is likely to be more stable in PfHT1 
thanit isin GLUT proteins, consistent with the comparative molecular 
dynamics simulations and the additional polar interactions observed 
between the TM7b and TMi helices. Rather than modifying the chemis- 
try of the sugar-binding pocket, we conclude PfHT1 has evolved TM7b 
substrate-gating dynamics so that it can transition into the occluded 
state more easily. In this way, the PfHT1is a more robust and promiscu- 
ous sugar transporter thanthe GLUT transporters, as it is less sensitive 
toaspecific type of sugar being bound. Certainly, substrate promiscuity 
would be an advantage to P. falciparum, whichis able to use D-glucose 
or D-fructose as a sole source of energy”**, 

Although TM7b substrate-gating interactions and dynamics might be 
exaggerated in PfHT1, we think they are also of functional importance 
to GLUT proteins. Indeed, the QLS motif—which (prior to structural 
information) was thought to confer D-fructose specificity by acting as 
aselectivity filter—is not located inthe main sugar-binding pocket, but 
is instead juxtaposed to the TM7a and TM7b breakpoint*”!. Likewise, 
an isoleucine-to-valine mutation in GLUT7 that abolishes D-fructose 
transport (while leaving D-glucose transport unaffected) is not located 
in the sugar-binding pocket, but is instead between TM7b and TM10 
half-helices*”. The importance of fine-tuned sugar-binding and gating 
would further explain why XylE binds D-glucose ina manner similar to 
that of PfHT1 and human GLUT3 (Extended Data Fig. 6d) but is incapable 
of transporting the sugar. To conclude, PfHT1 highlights that substrate- 
gating dynamics is probably a greater determinant for evolving sugar 


translocation than previously thought, and should be considered more 
closely in establishing transport mechanisms in general. 
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Methods 


No statistical methods were used to predetermine sample size. The 
experiments were not randomized and investigators were not blinded 
to allocation during experiments and outcome assessment. 


Construct design and cloning 

PfHT1 was cloned into the GAL-inducible vector pDDGFP2. The result- 
ing construct consisted of: residues 1-5 from rat GLUTS to facilitate 
recombinant expression, PfHT1 residues 20-504 (out of 504) (Uni- 
Prot accession number: 097467), followed by a tobacco etch virus 
(TEV) cleavage site and a C-terminal GFP-His, tag. The vector was 
transformed into the Saccharomycies cerevisiae strain FGY217 (MATa, 
ura3-52, lys2A201 and pep4A)” as previously described**. The result- 
ing translated sequence following TEV digestion, with the non-PfHT1 
residues from GLUTS and residues of the TEV cleavage site underlined, 
is: MEKEDSGFFSTSFKYVLSACIASFIFGYQVSVLNTIKNFIVVEFEWCKG 
EKDRLNCSNNTIQSSFLLASVFIGAVLGCGFSGYLVQFGRRLSLLITYNFFFLV 
SILTSITHHFHTILFARLLSGFGIGLVT VSVPMYISEMTHKDKKGAYGVMHQL 
FITFGIFVAVMLGLAMGEGPKADSTEPLTSFAKLWWRLMFLFPSVISLIGIL 
ALVVFFKEETPYFLFEKGRIEESKNILKKIYET DNVDEPLNAIKEAVEQNESA 
KKNSLSLLSALKIPSYRYVIILGCLLSGLQQFTGINVLVSNSNELYKEFLDSH 
LITILSVVMTAVNFLMTFPAIYIVEKLGRKTLLLWGCVGVLVAYLPTAIANEI 
NRNSNFVKILSIVATFVMIISFAVSYGPVLWIYLHEMFPSEIKDSAASLASLV 
NWVCAIIVVFPSDIIIKKSPSILFIVFSVMSILTFFFIFFFIKETKGGEIGTSPYIT 
MEERQKHMTKSVVENLYFQ. 


Large-scale production and purification 
For large-scale production, 24 | of S. cerevisiae FGY217 cells were grown 
in -URA medium containing 0.1% (v/v) glucose at 30 °C in 2-I shaking 
flasks. Protein production was induced at an optical density at600 nm 
(OD,oo) of 0.6 by the addition of galactose to a final concentration of 
2% (w/v). After 24 h incubation at 30 °C, the cells were collected, resus- 
pended in buffer containing 50 mM Tris-HCI pH 7.6, 1mMM EDTA, 0.6M 
sorbitol and lysed by mechanical disruption as previously described™. 
Membranes were isolated by ultracentrifugation at 4 °C and 195,000g 
for 2h, homogenized in 20 mM Tris-HCl pH 7.5, 0.3 M sucrose, 0.1mM 
CaCl.,, flash-frozen in liquid nitrogen and stored at —80 °C. The PfHT1- 
containing membranes were solubilized for 2 hat 4 °Cin equilibration 
buffer, consisting of 1x PBS, 150 mM NaCl, 10% (v/v) glycerol and 1% (w/v) 
n-dodecyI-f-D-maltopyranoside (DDM; Glycon). Non-solubilized mem- 
branes were removed by ultracentrifugation at 195,000g for 45 min, 
and the cleared supernatant was incubated with 15 ml of Ni**-nitrilotri- 
acetate affinity resin (Ni-NTA; Qiagen) for 2h at 4 °Cin the presence of 
40 mM imidazole under mild agitation. The resin was transferred toa 
30-ml Eco-column (Bio-Rad) and washed with 300 ml of equilibration 
buffer containing 0.1% (w/v) n-undecyl-B-D-maltopyranoside (UDM; 
Anatrace) and 50 mM imidazole. The immobilized protein was eluted 
in30 mlof equilibration buffer containing 0.1% (w/v) UDM and 250 mM 
imidazole. The eluate was incubated with equimolar TEV protease at 
4°C overnight to cleave the GFP-His, tag during dialysis performed 
against 3 | of dialysis buffer, consisting of 20 mM Tris-HCl pH 7.5, 150 
mM NaCl and 0.08% (w/v) UDM. The dialysed and digested sample 
was loaded onto a 5-ml HisTrap column (GE Healthcare) equilibrated 
with dialysis buffer, and the PfHT1-containing flow-through was col- 
lected and concentrated. The concentrated solution was applied ontoa 
PD-10 desalting column (Sephadex G-25, GE) pre-equilibrated in dialysis 
buffer, and the initial 1.6 ml of the flow-through collected, concentrated 
to 6-8 mg mI‘ and used for crystallization experiments. For proteoli- 
posome-based transport assays, the on-column immobilized PfHT1 
was washed, eluted and dialysed in equilibration buffer and dialysis 
buffer containing DDM 0.1% (w/v) and 0.03% (w/v), respectively, and 
concentrated to2 mg mI. 

PfHT1 mutants were generated by overlap PCR, cloned into the 
pDDGFP, vector, and overexpressed in 6-] cultures as previously 


described for the wild type. The PfHT1 mutants were purified as 
described for the wild type, but without GFP-His, tag removal by TEV 
protease cleavage. The purified PfHT1—-GFP fusions were concentrated 
to2mg mlIand judged to be monodisperse by size-exclusion chroma- 
tography using an Enrich 650 10 x 300 column in buffer containing 20 
mM Tris-HCl pH 7.5, 150 mM NaCl and 0.03% (w/v) DDM. 


Transport activity of PfHT1 reconstituted into liposomes 

Total bovine brain lipid extracts (Sigma Aldrich) and cholesteryl- 
hemisuccinate (CHS) (Sigma-Aldrich) powder were mixed in buffer 
containing 10 mM Tris-HCI pH 7.5 and 2 mM MgSO, to a final concentra- 
tion of 30 and 6 mg mI, respectively. The lipid mixture was subjected 
to multiple rounds of freeze-thaw cycles by flash-freezing in liquid 
nitrogen and thawing at room temperature interspersed with sonica- 
tion. Lipid mixture was further spun down at 16,000g for 15 min and 
the supernatant containing small unilamellar vesicles was collected. 
To make proteoliposomes, 10 pg of purified PfHT1 was added to 500 
pl of unilamellar vesicles, flash-frozen and thawed at room tempera- 
ture. Large unilamellar proteoliposomes were prepared by extrusion 
(LiposoFast, Avestin; membrane pore size, 400 nm). Transport assays 
for PfHT1 mutants were carried as GFP fusions and compared with 
PfHT1 wild type prepared in the same manner. 

For the transport time-course experiments, 15 pl of prepared prote- 
oliposomes were diluted into 45 ul of external buffer consisting of 10 
mM Tris-HCI 7.5,2mM MgSO, and either: [“C]D-glucose (30 1M) (Ameri- 
can Radiolabelled Chemicals and Moravek Biochemicals), [7H]D-xylose 
(0.3 pM) (American Radiolabelled Chemicals), [*C]D-mannose (30 1M) 
(Moravek Biochemicals) or [*C]D-galactose (30 uM) (American Radiola- 
belled Chemicals), [*C]D-fructose (6.0 1M), (Moravek Biochemicals), 
[?H]D-glucosamine (0.3 1M) (Perkin Elmer). The reaction was stopped 
by the addition of 1 ml of 10 mM Tris-MgSO, buffer and followed by 
rapid filtering through a 0.22-um filter (Millipore). The on-filter col- 
lected proteoliposomes were washed with 6 ml of buffer containing 
10 mM Tris-HCl 7.5 and 2mM MgSO,, transferred to scintillation vials 
and emulsified in 5 ml of Ultima Gold scintillation liquid (Perkin Elmer) 
before scintillation counting (TRI-CARB 4810TR 110 V; Perkin Elmer). 

The proteoliposomes for [“*C]D-glucose competitive-uptake assays 
were prepared as described for the time-course experiments. [“C]p- 
glucose competitive uptake was measured at 30 s in external buffer 
containing unlabelled sugars at a final concentration of 50 mM. 

For kinetic analysis, the Ky, and V,,,, values for D-glucose and D-fruc- 
tose transport for PfHT1 and mutants were determined by measuring 
the initial transport velocities for D-glucose at 20 s and D-fructose at 60 
s for increasing concentrations of these sugars in buffer containing sto- 
chiometric amounts of [“C]p-glucose or [*C]D-fructose. The recorded 
radioactivity from empty liposomes was subtracted from the recorded 
decay counts of the transported sugars and fitted to Michaelis-Menten 
kinetics using nonlinear regression by GraphPad Prism 7.0. Ky, and Vinax 
values for D-mannose, D-galactose and D-glucosamine for PfHT1 were 
determined in the same way, with initial transport velocities recorded 
at 20 s for D-mannose and 60 s for D-galactose and D-glucosamine. 
Time course and kinetics of PfHT1 and PfHT1-GFP were measured to 
be comparable. To calculate k,,,, the amount of transported sugar was 
therefore normalized by the fraction of reconstituted PfHT1-GFP fusion 
incorporated into liposomes, which could be calculated by fluores- 
cence-detection size-exclusion chromatography (FSEC)® analysis of 
3% DDM (w/v)-solubilized proteoliposomes. PfHT1-GFP orientation 
into liposomes was estimated by incubating 150 pl of proteoliposomes 
with and without TEV protease at a ratio of 1:3 (w/w) overnight at 4 °C 
and the fraction of cleaved GFP estimated by in-gel fluorescence™ with 
aratio of about 60:40 (outside: inside) calculated. 

For inhibition assays, 15 pl of PfHT1 wild type- or Trp412Ala mutant- 
containing proteoliposomes were diluted into 43 pl of buffer consisting 
of 10 mM Tris-HCl 7.5, 2 mM Tris-MgSO, that had been pre-incubated 
for 1hin either 4% (v/v) DMSO or 4% (v/v) DMSO with 115 pl of the tested 


compounds for inhibition; C3361 was synthesized by BOC Sciences 
and the MVV009085-homologue was supplied by Mcule. Transport 
was initiated by the addition of 2 ul of [C]D-glucose at 40 uM final 
concentration. The reaction was stopped by the addition of 1 ml of 
10 mM Tris-MgSO, buffer and followed by rapid filtering through a 
0.22-um filter (Millipore). The on-filter collected proteoliposomes 
were washed with 6 ml of buffer containing 10 mM Tris-HCI 7.5 and 
2mM MgSO,, transferred to scintillation vials and emulsified in 5 ml 
of Ultima Gold scintillation liquid (Perkin Elmer) before scintillation 
counting (TRI-CARB 4810TR 110 V; Perkin Elmer). 

The recorded radioactivity of empty liposomes at 4% (v/v) DMSO 
was subtracted from each tested condition. Half-maximal inhibitory 
concentration (IC,,.) values were obtained by fitting one-phase decay 
nonlinear regression by GraphPad Prism 7.0. All transport results are 
represented as mean values (n=3) with their corresponding standard 
errors. 


Crystallization and structure determination of PfHT1 

Crystals of PfHT1 in complex with D-glucose were grown at 4 °C using 
the hanging-drop vapour-diffusion method. Purified PfHT1 protein at 
8 mg/ml was added D-glucose to a final concentration of 50 mM. One 
microlitre of this solution was mixed 1:1 with reservoir solution consist- 
ing of 0.1M MES pH6.5, 0.1M MgCl,, 26-30% (w/v) PEG 300 and 0.2% 
(w/v) n-nonyl-B-D-glucopyranoside (NG, Anatrace). Crystals appeared 
within 1 week in 26% PEG 300 and were dehydrated by equilibration of 
the drops against 500 pl reservoir solution containing increasing con- 
centrations of PEG 300 in steps of 2% (w/v) up toa final concentration 
of 32% (w/v). Crystals were subsequently collected and flash-frozen in 
and stored under liquid nitrogen. 

X-ray diffraction data from PfHT1 crystals were collected at 100K at 
the European Synchrotron Radiation Facility (ESRF) at the beamlines 
ID30A-3 and ID23-1. Two datasets from different crystals were indexed, 
integrated and scaled together using XDS* before merging using Aim- 
less”. Initial phases of PfHT1 were obtained by molecular replacement 
using phenix.mr_rosetta**”’ and the outward-facing occluded structure 
of human GLUT3 as an input search model (PDB 4ZW9). There are four 
PfHT1 molecules inthe asymmetric unit. Structure refinement was car- 
ried out using Phenix.refine*°“ and auto BUSTER” with local NCS (non- 
crystallographic symmetry), one TLS (translation-libration-screw 
rotation) group per chain and external constraints to human GLUT3, 
interspersed with manual model building in Coot*. The Ramachandran 
Statistics are 92.6% favoured, 6.91% allowed and 0.93% outliers. Other 
data collection and refinement statistics are presented in Extended 
Data Table 2. Structural alignments were performed using the align 
command of PyMol software (http://www.pymol.org/) using Ca coor- 
dinates. 


Molecular dynamics simulations 

The starting models used for molecular dynamics simulations were 
human GLUT3 (PDB 4ZW9) and chain C of PfHT1. The cytosolic loops 
of PfHT1 connecting TMS to TM6 and TM9 to TM10, as well as ICHS, 
were modelled using MODELLER“ version 9.21 before simulations. Six 
simulation systems were constructed (three for PfHT1 and three for 
GLUT3), each of which consisted of the protein embedded in a POPC 
bilayer. To do this, six lipid configurations were generated using the 
CHARMM-GUI membrane builder*, in which the protein (and ligands 
if applicable) was embedded. These systems were then solvated in 
150 mM NaCl. Details of each simulation replica can be found in Sup- 
plementary Table 1. 

Allsystems underwent energy minimization using steepest descent. 
Equilibration molecular dynamics was then performed for a total of 375 
ps, gradually relaxing positional restraints on protein, POPC lipids and 
ligands, when relevant. The duration of each production molecular 
dynamics simulation can be found in Supplementary Table 1. Simu- 
lations were carried out under periodic boundary conditions and 


production molecular dynamics was carried out using a 2-fs time steps. 
The temperature and pressure were maintained at 303.15K and 1 bar 
using the Berendsen thermostat and barostat**, respectively. Pressure 
coupling was performed using semi-isotropic coupling with a time 
constant of 5 ps and compressibility of 4.5 x 10° bar‘. Temperature cou- 
pling was performed using three separate groups for protein, lipids and 
solvent. Hydrogen bonds were constrained using the linear constraint 
solver (LINCS). Electrostatic interactions were modelled witha 1.2-nm 
cutoff, with a switching function between 1.0 and 1.2 nm. Long-range 
electrostatics were calculated using particle mesh Ewald (PME)*®. All- 
atom molecular dynamics simulations were performed using Gromacs 
2018.1. Interactions were modelled with the CHARMM36m (protein, 
lipids and ions) and the TIP3P (water) forcefields”. 


Analysis of simulations and protein morphing 

Analysis of the molecular dynamics simulations were performed using 
the gromacs analysis tools gmx rmsf and gmx pairdist, for root mean 
square fluctuation and gating-residue distance calculations. Gate dis- 
tance was determined by measuring the centre of mass of residues that 
remained closest around the extracellular gate during the simulations: 
residues Val44 to Ile50 and Asn311 to Ser317 for PfHT1, and residues 
Thr28 to Pro34 and Asn286 to Ser292 for human GLUT3. The gate dis- 
tances plotted are the mean between the three replicas are indicated 
by darkened lines (Fig. 3e). Python scripts were written to parse and 
plot relevant data®’. Figure generation was performed using PyMol 
(https://pymol.org/2/). 

Morphing between structural states for movie generation was per- 
formed using PyMol. From the PfHT1 structure presented here, chain C 
was used to generate models in the following resolved conformations: 
outward-open (GLUTS, PDB 4YB9), outward-occluded (GLUT3, PDB 
4ZW9), inward-occluded (XylE, PDB 4JA3) and inward-open (GLUTS, 
PDB 4YBQ). Structural alignments of the proteins were carried out 
in PyMol, and subsequent model generation using MODELLER. The 
N termini of these respective models were superimposed, morphed 
between states and the video was made using PyMol. 


Sugar-porter principal component analysis 

Principal component analysis (PCA) is a statistical technique to reveal 
dominant patterns™. Diagonalization of the covariance matrix of a 
system of variables renders the major axes of statistical variance or 
principal components, thus mapping complex multidimensional data 
into a few coordinates, which contain the major trends that explain 
the statistical variation. For the sugar-porter structures, a set of near- 
intact structures sharing 30% homology with PfHT1-that is, eukaryotic 
GLUT structures (GLUT1, GLUT3 and GLUTS) along with F£. coli XylE 
transporter (16 PDB codes in total: PfHT1, 6RW3; XylE, 4GBZ, 4GBY, 
6N3I, 4JA3 and 4JA4; GLUTI, 4PYP, SEQI, SEQH and SEQG; GLUT3, 5C65, 
4ZWC, 4ZWB and 4ZW9; and GLUTS, 4YB9 and 4YBQ)—were aligned 
to extract the common structural fold, mostly formed by conserved 
helices (353 residues) (Extended Data Fig. 9a). Missing residues insome 
of the structures were rebuilt with MODELLER, making sure that the 
positions of the corresponding structural elements were strictly kept 
for the core alignment. The structural ensemble was aligned to the 
structure of GLUTS in an open outward-facing conformation (root 
mean square deviation (r.m.s.d.) of 2.7 + 1.2 A) and used to compute 
the covariance matrix versus this reference; that is, the mean-square 
deviations in atomic coordinates from their mean position (diagonal 
elements) and the correlations between their pairwise fluctuations (off- 
diagonal elements). The covariance matrix was diagonalized to obtain 
aset of eigenvectors or principal components, ordered according to 
their eigenvalues with decreasing variance from those representing 
the largest-scale motions up to the smallest fluctuations in atomic 
coordinates. Within this framework, any structure iis characterized by 
its scalar product projections onto the conformational space defined 
by the major components, PC, (k=1,2...n): 
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PC, = T_,cos(PC, A T_o) 


in which T,_, is the vector between the coordinates of istructure and 
the chosen reference 0 (4YBQ, inthis case), and PC, is one of the major 
principal component axes, which can classify and cluster structures, 
and extract motion information and transition pathways from them”. 
For the MFS ensemble, the first component alone captures about 65% 
of the structural variation associated with the rocker switch, thus sepa- 
rating the crystallographic structures along the transport cycle. The 
angle between the sugar-porter tandem repeats was estimated as the 
angle formed by TM2 and TM6&, using an in-house Visual Molecular 
Dynamics script. 


Reporting summary 


Further information on research design is available in the Nature 
Research Reporting Summary linked to this paper. 


Data availability 


The coordinates and the structure factors for PfHT1 have been depos- 
ited inthe PDB 6RW3. All data are available in the paper or Supplemen- 
tary Information. 
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Extended Data Fig. 1| PfHT1is distantly related to both the human GLUT 
transporters and the bacterial homologue XylE. a, Unrooted phylogenetic 
tree of human GLUT1, GLUT2, GLUT3, GLUT4 and GLUTS (GLUT1-S), rat GLUTS, 
E.coli XylE and PfHT1b, Table of protein sequence identity of proteins shown 
ina. (only rat (and not human) GLUTS is shown). c, Sequence alignment of 
PfHT1, human GLUT1-5, rat GLUTS and E£. coli XylE. Secondary structure 


elements of PfHT1are indicated above the alignment, and coloured as in Fig. Ic. 


Residues conserved in at least 80% of the alignment are highlighted by red 
boxes, and gating residues are highlighted by blue boxes. Conserved binding- 
site residues between PfHT1and human GLUT3 are indicated with purple filled 


dots and non-conserved residues are indicated with non-filled purple dots. 
Conserved residues close to the binding site are indicated with yellow filled 
dots and non-conserved residues with non-filled yellow dots. Black bars 
beneath the alignment indicate residues in the sugar-porter motifs'**. The 
Uniprot reference numbers of the alignment proteins are: PFHT1(Q7KWJ5), 
human GLUT1(P11166), human GLUT2 (Q102R8), human GLUT3 (P11169), 
human GLUT4 (P14672), human GLUTS (P22732), rat GLUTS (P43427) and XylE 
(POAGF4). For the sake of clarity, residues 109-121 from XylE and residues 
54-86 from human GLUT2 were omitted. 
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Extended Data Fig. 2| PfHT1 has evolved to be an efficient polyspecific sugar 
transporter. a, Size-exclusion chromatogram of DDM-purified PFHT1 showing 
PfHT1 migrates as two oligomeric species (dimer and monomer); the sample 
migrates as amonomer during SDS-PAGE. b, Time-dependent uptake of [*C] 
D-mannose (black circles) by PfHT1in proteoliposomes. Inset, PFHT1 uptake of 
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radiolabelled sugar (cyan trace) compared with non-specific uptake estimated 
from radioactivity measured from liposomes incubated without protein (black 
trace). Error bars represent the mean +s.e.m. of n=3 biologically independent 
experiments.c, Asinb, for [*C]p-galactose. d, As inb, for [*C]D-fructose.e, As 
inb, for PH]D-xylose. f, Asin b, for PH]D-glucosamine. 


Article 


a b. 
20,000 12,500 
Ky = 0.80 + 0.08 mM Ku = 1.48 + 0.26 mM 
16,000 10,000 
e ? 
E 12,000 - 7,500 
: P= 
E 5 
> 8,000 3 5,000 
E E 
to 
4,000 2,500 


D-glucose (mM) D-mannose (mM) 


Ky = 9.54 + 0.75 mM Ky = 9.55 + 1.70 mM 


32,000 24,000 ce) 
= 24,000 =. 18,000 i) 
& & 
= E 
xe) ° 
£ 16,000 £ 12,000 
1 =} Cc 
8,000 6,000 
0 10 20 30 0 13 26 39 
D-galactose (mM) D-fructose (mM) 
e. f. 
7,500 30 
Ku = 14.69 + 4.21 mM 
6,000 9 25 
‘D a 20 
E 4,500 ® 
ss = 
< 
E E 15 
3 3,000 ¥ 
5 SB “HO 
1,500 
5 
0 14 28 42 % 8 8 % @ 
8 2 2 2 € 
A [e} 
D-glucosamine (mM) 7 s 8 2 8 
a E D a Q 
[a) a > 
g. a 
Ku (mM) Vinax (Umol! - min-t- mg~*) Kat (S*') Keat/Km (mM - s-*) 
D-glucose 0.80 + 0.08 21.11 + 0.82 19.35 40.75 23.75 + 2.48 
D-mannose 1.48 + 0.26 13.04 + 0.85 11.95 + 0.78 8.10 + 0.95 
D-galactose 9.54 +0.75 50.29 + 1.75 46.10 + 1.61 4.80 + 0.27 
D-fructose 9.55 + 1.70 33.11 + 2.09 30.36 + 1.91 3.18 + 0.38 
D-glucosamine 14.69 + 4.21 8.724 1.11 8.01 + 0.92 0.54 +0.10 


Extended Data Fig. 3 | PfHT1has evolved to beanefficient polyspecificsugar recordedafter60s.e, Asina, for D-glucosamine. f, Bars represent specificity 


transporter. a, Zero trans kinetics of PfFHT1D-glucose transport. Kinetic constant (K,2/Ky) values of PfHT1 for different sugars as tabulated and 

curves were fitted from data points recorded at increasing D-glucose described ing. g, The fitted values reported for the Michaelis constant (K,,) and 
concentrations after 20s and fitted by nonlinear regression. Error bars Vinax Of PfHT1 for different transported sugars are mean +s.e.m. of 

represent mean +s.e.m. of n=3 biologically independent experiments. b. Asin n=3 biologically independent experiments. Turnover (k,,) and specificity 

a, for D-mannose. c, Asina, for D-galactose and except that time points were constant (K,,,/Ky) values are derived from these kinetic parameters and 


recorded after 60s. d, As ina, for D-fructose and except that time points were adjusted to the amount of protein reconstituted into liposomes (Methods). 
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Extended Data Fig. 4 | See next page for caption. 
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Extended Data Fig. 4 | Overall structural features of the PfHT1 structure. 

a, PfHT1 crystallized as a dimer with four molecules inthe asymmetric unit. 
The shared dimer interface was formed between the respective N-terminal 
domains (blue) that—although not extensive (522 Aburied surface area)—are 
consistent with the fact that a fraction of purified PfHT1 migrates as a dimer by 
size-exclusion chromatography (Extended Data Fig. 2a). Notably, the gating 


helix TM7bis not making any crystal contacts. b, Superposition of the outward- 


occluded GLUT3 (PDB 4ZWB) (grey) and the occluded PfHT1 structures. The 
r.m.s.d.is1.4 A for 446 pairs of Cx atoms (Methods). c, Cartoon representation 
of PfHT1as viewed from the cytoplasm. Blue, NTD; magenta, CTD. ICHs are not 


shown for clarity. Interdomain salt-bridge-forming residues are shown as 
sticks, and labelled. d, Cartoon representation of TM7b of human GLUT1 (PDB 
4PYP) (orange) and bovine GLUTS (PDB 4YB9) (purple) inthe inward-open 
conformation, and PfHT1in the D-glucose-bound (yellow sticks) occluded 
conformation (magenta). e, Electron density map 2F,_F,(1.50) (blue mesh) for 
the PfHT1 structure (left) and the D-glucose residues in the sugar-binding 
pocket in cyan (right). The F,_ F. (3.00) (green mesh) maps before addition of 
and refinement in the presence of D-glucose are also shown. Despite the high 
quality of the maps, we observed no electron density for ICHS5 (locationin 
human GLUT3 shownasa dashed ellipse). 
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Extended Data Fig. 5 | Sugars assessed for inhibition of PfHT1. a, Chemical competitive uptake of [“C]p-glucose by PfHT1in proteoliposomes inthe 
structures of the investigated sugars used in competitive-uptake assays. absence (white bars) and presence of non-labelled sugars (black bars). Non- 
Differences in hydroxyl-group position of the respective D-glucoseepimersare specific uptake was estimated from radioactivity measured from liposomes 
coloured red. Sugars labelled green were also tested as radiolabelled incubated without protein (red bar). Error bars represent mean and s.e.m. of 


substrates in time-course experiments (Fig. 1b, Extended Data Fig.2b-f).b, The n=3biologically independent experiments. 
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Extended Data Fig. 6 | See next page for caption. 
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Extended Data Fig. 6 | Assessed quality of purified PfHT1-GFP fusions and 
analysis of the sugar-binding pocket of PfHT1. a, FSEC traces of DDM-purified 
PfHT1-GFP wild type and mutants of the sugar-binding pocket; PFHT1-GFP 
migrates as two species (dimer and monomer), consistent with purified PfHT1 
(Extended Data Fig. 2a). FSEC traces were recorded at least twice for wild type 
and each respective mutant. b. As ina, for mutants peripheral to the sugar- 
binding pocket.c, Asina, for mutants located in TM1and TM7b. d, Cartoon 
representation of PfHT1 with D-glucose and interacting residues labelled, 
shownas yellow sticks. The position of D-glucose in F. coli XylE (green) (PDB 
4JA3) and D-glucose in human GLUT3 (grey) (PDB 4ZW9) are shownassticks, 


after protein superimposition. e, Determination of the Michaelis constant (Ky) 
for D-glucose by the PfHT1 mutant Ala404Glu, constructed to mimic the 
human GLUT3 binding site. Kinetic curves were fitted from data points 
recorded over a range of increasing D-glucose concentrations after 90s, and 
fitted by nonlinear regression using data from n=3 biologically independent 
experiments (values reported are mean +s.e.m. of the fit). f, Asine, for D- 
fructose. g, Sugar-binding-site comparison between PfHT1 side chains (yellow 
sticks) and rat GLUTS side chains (conserved side chains, grey sticks; non- 
conserved side chains, cyan sticks). h, As ine, for the PfHT1 mutant Trp412Ala 
constructed to mimic the rat GLUTS binding site. 
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Extended Data Fig. 7 | Small-molecule-inhibition analysis of PfHT1. 

a, Surface transversal cross-sections through the membrane of the PfHT1 
structure inthe occluded conformation with D-glucose shownas sticks, anda 
side vestibule accessible to the C3- and C4-hydoxyl groups that—in human 
GLUT3—was occupied by amonoolein lipid. b, The competitive uptake of 
[*C]p-glucose (black bars) by PfHT1 wild type and the mutant W412A in 
proteoliposomes in the absence and presence of cytochalasin B (Cyb) (120 pM), 
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n=3 biologically independent experiments (left). Cytochalasin B and the 
inhibitor C3361 (which is D-glucose with a undec-10-en chain at the C3-hydroxyl 
position) are shown (right). c, 1C;) curves for MMV009085 (blue filled circles) 
and derivative lacking the butanol tails (grey filled circles). Error bars represent 
s.e.m. of n=3 biologically independent experiments (left). Structures of 
MMV009085 and derivative are shown on the right. 
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Extended Data Fig. 8 | See next page for caption. 
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Extended Data Fig. 8 | The extracellular substrate-gating helix TM7b. 

a, Cartoon representation of TM7b of human GLUT3 in the outward-open 
(green) (PDB 4ZWC) and outward-occluded D-glucose-bound conformation 
(light-brown) (PDB 4ZW9) and of PfHT1in the D-glucose-bound occluded 
conformation (magenta). Arrows indicate that inward movement of TM7bis 
coupled to coordination of the C3- and C4- hydroxyl groups of D-glucose 
(shownas yellow sticks) by the strictly conserved asparagine residue 
corresponding to Asn311in PfHT1. Only in PfHT1 does TM7b break into two 
perpendicular segments as observed in the in inward-facing structures of 
GLUT1and GLUTS (Extended Data Fig. 4d). The asterisk highlights the highly 
conserved tyrosine residues that occlude the substrate from exiting in the 
outward-occluded conformations, and which—in PfHT1—are replaced by serine 


(S315) and asparagine (N316). b, Molecular dynamics simulations of TM1-TM7 
gating interactions: D-glucose-bound outward-occluded gating interactions 
by human GLUT3 in the presence (yellow) and absence (grey) of D-glucose. The 
distribution of conformations, shownas the gating distance, from three 
independent 1-ps molecular dynamics simulations, as described inthe 
Methods.c, Asinb, for D-glucose-bound outward-occluded gating interactions 
by PfHT1in the presence (magenta) and absence (yellow) of D-glucose. The 
distribution of conformations, shownas the gating distance, from three 
independent 1-ps molecular dynamics simulations, as described inthe 
Methods. d, Snapshots of the distance between residues Lys51 and Asn316 at 
0-ns, 500-ns and1,000-ns time points. 
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Extended Data Fig. 9 | See next page for caption. 
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Extended Data Fig. 9 | PCA analysis of sugar-porter structure and 
alternating-access mechanism. a, Aligned core of near-intact MFS transporter 
structures from 17 structures, including PfHT1, mammalian (GLUT1, GLUT3 
and GLUTS) and prokaryotic (XylE) systems. The ‘core’ structural elements 
conserved among the structures are shown in different colours (Methods). 

b, Motions along the first principal component (PC1) derived from the core 
ensemble, tracking the rocker-switch motion. c, Residue fluctuations 
computed from PCI, with the core helix fragments shown as shadowed areas at 
the base. d, Schematic of the structural basis of the sugar-porter alternating- 
access mechanism. To summarize, in the outward and outward-occluded 
conformations (PDB 4YBQ and 4ZW9) the substrate-gating helix TM7b 
(magenta and transparent) is mobile and samples either state, as seenin 
molecular dynamics simulation of human GLUT3; spontaneous gate closure is 
further consistent with the fact that—even in the presence of maltose—GLUT3 
crystallizes in both outward-open and outward-occluded conformations‘. 
Substrate binding conformationally stabilizes the outward-occluded state, 
thus increasing the likelihood for TM7bto break in the middle, completely 


close the substrate pocket and form contacts with TMI. In the occluded state, 
the salt-bridge interactions between ICHS in the C-terminal bundle and ICH1, 
ICH21CH3 and ICH4 are lost, which indirectly destabilizes the highly conserved 
intrabundle salt-bridge network. Breakage of the intrabundle salt-bridge 
network catalyses global rocker-switch rearrangements of the N- and 
C-terminal bundles. Inthe inward-occluded conformation (PDB 4JA3), the 
intracellular gating helix TM10b (cyan)—which is related by inverted symmetry 
to TM7b-—spontaneously moves outward to the inward-open conformation 
(PDB 4YB9). After sugar release, the sugar porter spontaneously resets itself to 
the outward-facing conformation through an ‘empty’ occluded state”. 
Spontaneous resetting means that the energetic barriers separating opposite- 
facing states must be low enough that the occluded state can form in the 
absence of sugar binding. Nevertheless, consistent with a conformational- 
selection-driven rocker-switch mechanism, substrate binding catalyses 
transportas rates are substantially faster through ‘substrate-bound’ versus 
‘empty’ occluded-state transitions’. 


Extended Data Table 1| Zero trans proteoliposome (lacking internal sugar) in vitro kinetics of D-glucose and D-fructose by 
PfHT1 wild type and mutants 


Ku (mM) Vinax (umol-min-mg’) Keat (S"') Kea/Km (mM"-s"1) ratio Kca/Ku 
glucose fructose glucose fructose glucose fructose glucose fructose glucose/fructose 

PfHT1 WT 0.804010 951+41.70 21.12+080 33124214 19.32+0.60 30.3641.91 23.814+2.52 3.20+0.41 7.44 + 0.78 

S315A 0.794021 11.164283 5.844052 22.7842.32 5.314052 20.81+2.06 6.784+1.13 1.78 + 0.30 3.81 + 0.67 

E319A 0.734015 17.04+42.94 10.01+0.72 57.77+463 9.144062 52684423 12.5141.68  3.08+0.37 4.06 + 0.52 
W412A 0.91 + 0.22 - 4.63 + 0.25 - 4.21+0.25 - 4.67 + 0.67 - - 

A404E 1.8640.43 22.014+484 22304163 33.714444 13.214094 30694394 7.114102 4174068 1.71 + 0.28 
N435A 0.91 +0.14 - 7.46 + 0.42 - 6.84 + 0.43 - 7.52 + 0.78 - - 
GLUT5* - 10.91 + 1.78 - 52.30 + 2.90 - 43.28 + 2.40 - 4.04 + 0.37 - 


The results shown are mean from n= 3 biologically independent experiments (values reported are mean + s.e.m. of the fit). The W412A and N435A mutants abolish D-fructose uptake, and 
therefore D-fructose kinetics could not be measured (-). 


*The D-fructose rat GLUT5 kinetic parameters in proteoliposomes were measured as for PfHT1, using protein purified as previously described®. 
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Extended Data Table 2 | Data collection and refinement statistics (molecular replacement) 


Data collection 
Space group 


Cell dimensions 


Glucose bound 


P21 


a, b,c (A) 73.74, 189.45, 136.89 
a, By (°) 90.0, 96.4, 90.0 
Resolution (A) 21.85 - 3.65 (3.80-3.65) 
Rmerge (%) 28.5 (>100) 
Vol 9.8 (0.51) 
cc* 0.99 (0.57) 
Completeness (%) 99.4 (99.7) 
Redundancy 12.4 (12.7) 
Refinement 
Resolution (A) 21.85 - 3.65 
No. reflections 41238 
Rwork / Riree (%) 27.3/28.4 
No. atoms 
Protein 13908 
Ligand/ion 96 
Water - 
B-factors 
Protein 230.80 
Ligand/ion 196.06 
Water - 
R.m.s. deviations 
Bond lengths (A) 0.012 
Bond angles (°) 1.54 


Data were obtained by scaling together two datasets that were collected on different crystals. The highest-resolution shell used in the final refinement is shown in parentheses. 
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Statistical parameters 


When statistical analyses are reported, confirm that the following items are present in the relevant location (e.g. figure legend, table legend, main 
text, or Methods section). 


n/a | Confirmed 


The exact sample size (n) for each experimental group/condition, given as a discrete number and unit of measurement 


An indication of whether measurements were taken from distinct samples or whether the same sample was measured repeatedly 


The statistical test(s) used AND whether they are one- or two-sided 
Only common tests should be described solely by name; describe more complex techniques in the Methods section. 


A description of all covariates tested 


A description of any assumptions or corrections, such as tests of normality and adjustment for multiple comparisons 


A full description of the statistics including central tendency (e.g. means) or other basic estimates (e.g. regression coefficient) AND 
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Forests play a key part in the water cycle, so both planting and removing 
forests can affect streamflow. Ina recent Article’, Evaristo and McDon- 
nell used a gradient-boosted-tree model to conclude that streamflow 
response to forest removal is predominantly controlled by the potential 
water storage in the landscape, and that removing the world’s forests 
would contribute an additional 34,098 km? yr“ to streamflow world- 
wide, nearly doubling global river flow. Here we report several prob- 
lems with Evaristo and McDonnell’s' database, their model, and the 
extrapolation of their results to the continental and global scale. The 
main results of the paper! remain unsubstantiated, because they rely on 
a database with multiple errors and a model that fails validation tests. 


Database problems 


We spot-checked the database underlying Evaristo and McDonnell’s 
analysis! by comparing individual entries to the original cited refer- 
ences. Roughly half of these spot checks revealed substantial errors in 
the calculated changes in water yields, or errors inthe classification of 
individual studies as forest planting versus forest removal experiments. 
Here we describe four examples. (1) The Valtorto catchment in Portugal 
is classified as a forest clearing experiment’ although the catchment was 
never forested, but rather covered by 50-cm-tall heath”. The reported 
post-clearing streamflow increase of 363.6% (ref. ') is also inconsistent 
with table 3 of ref.*, which reports that average streamflow increased 
by 150%, from 1.0 m? per day to 2.5 m’ per day. (2) The database reports 
that forest clearing at the Lemon catchment in Australia increased 
streamflow by 631.8% (ref.'), but from table 1 of ref.’, we calculate that 
the average pre- and post-clearing streamflows were 18.0 mm yr ‘and 
27.9 mm yr‘ respectively, implying that streamflow increased by only 
55%. (3) Brigalow catchments C2 and C3, which each appear twice in 
the database, are classified as forest planting experiments’ although 
neither was planted with forest: C2 was planted with sorghum and wheat 
and C3 was planted with buffel grass for pasture*». (4) Several forest 
conversion experiments, in which forests were cleared and replanted 
with other vegetation (for example, references 74, 114, 130 and 163 in 
ref.'), are reported in the database as showing, counterintuitively, large 
streamflow increases caused by forest planting!. However, the reported 
changes in streamflow were calculated relative to intact forest control 
plots, not cleared land, so they mostly reflect the effects of clearing 
the existing forest rather than the effects of planting. We suspect that 
this misattribution of forest clearing effects to forest planting may 
underlie the paper’s surprising finding (see Fig. 2 of ref. ‘and associ- 
ated discussion) that forest planting appears to increase streamflow 
by 100% or more at many sites, with the largest increases at sites with 


the highest evapotranspiration rates, a pattern that would normally 
arise from forest clearing instead. 


Model overfitting and validation failure 


Gradient-boosted regression trees are data-hungry, and although Evar- 
isto and McDonnell’ compiled every paired watershed study that they 
could find, the resulting databases of 161 forest clearing experiments 
and 90 forest planting experiments are much too small to estimate 
their seven-variable model reliably. We checked the model codes that 
Evaristo and McDonnell provided with their paper (see the code avail- 
ability statement of ref.') and found that the boosted tree algorithm 
fits 200 free parameters (not counting the dozens of additional free 
parameters that define the tree’s branch points), suggesting substan- 
tial overfitting. To test how this overfitting might affect the model’s 
predictions, we split the forest removal and planting databases into 
training sets (80% of the data) and test sets (the remaining 20% of the 
data). To balance the distributions of the variables between the train- 
ing and test sets, we used stratified random sampling; we also used 
un-stratified random sampling as a more stringent test. We then re-ran 
the boosted-tree analysis, using the same data, the same platform 
(JMP, the SAS Institute), and the same algorithm options that Evaristo 
and McDonnell’ used, for 300 of these random splits of the data, both 
with and without ‘early stopping’ (in which the fitting algorithm stops 
whenever the next layer would reduce the R’). 

The results in Fig. 1 show that the model fails these validation tests. 
If the model were not overfitted, the fits to the test data (as measured 
by the test R? on the vertical axis) would be similar to the fits to the 
training data (as measured by the training R’ on the horizontal axis), 
and the dots would lie close to the 1:1 line. Instead, many of the dots 
lie far below the 1:1 line, and many test R? values even lie below zero, 
indicating model predictions that are worse than random guessing. 
Figure 1 thus shows that the model is overfitted and makes unreliable 
predictions (because it is too flexible, and thus has been ‘fitted to the 
noise’ in the training data). This result holds whether one uses ‘early 
stopping’ or not, and both stratified and un-stratified validation tests 
yield broadly similar results. 

Although individual randomizations can yield test R’ values that are 
similar to the training R? (or even higher), one should not draw conclu- 
sions from such anomalies. Model performanceis better reflected in the 
medians of the training and test R’ values across many randomization 
trials (Table 1). Table 1 confirms quantitatively what Fig. 1 shows visu- 
ally: in each case, the median test R? is much smaller than the median 
training R’, and many test R’ values are below zero. 
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Fig. 1|Split-sample validation tests of gradient-boosted-tree model fitted to 
forest clearing and planting data. a, b, Model fitted to forest clearing data 
with and without early stopping; c, d, model fitted to forest planting data with 
and without early stopping. The source data were randomly split into 300 
training and test sets in 80/20 ratios, as described in the text. If the model were 


Allofthe paper’s' main results are based on the boosted-tree model, 
so the validation failure documented here invalidates the paper’s con- 
clusions. The other machine learning methods in the paper have similar 
validation issues, but we will not explore them in detail because the 
paper’s conclusions do not depend onthem. 


Exaggerated importance of potential storage 

The finding! that streamflow response to forest removal was primarily 
controlled, not by climate, but by total potential water storage in the 
landscape, was puzzling to us for two reasons. First, it was difficult to 
imagine how total storage, much of which may lie below the rooting 


Table 1| Summary of split-sample validation test results 


Model and split-sample test Median Median Fraction of 
performed (80/20 splitinallcases) trainingR? test R? test R?<O 
Forest removal model 

Stratified, with early stopping 0.449 0.108 31% 
Stratified, without early stopping 0.605 0.096 36% 
Unstratified, with early stopping 0.458 0.053 34% 
Unstratified, without early stopping 0.608 0.057 40% 
Forest planting model 

Stratified, with early stopping 0.827 0.455 13% 
Stratified, without early stopping 0.852 0.486 10% 
Unstratified, with early stopping 0.826 0.475 16% 
Unstratified, without early stopping 0.844 0.474 17% 


Test results are shown for the boosted-tree model fitted to forest removal and forest planting 
data. ‘Fraction of test R’ < 0’ indicates the percentage of tests in which model predictions 
were worse than random guessing. 
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not overfitted, the R’ statistics obtained from the training and test sets would 
be similar to one another, and thus the dots would lie close to the 1:1 lines. 
Instead, the test R’ statistics are generally much smaller than the training R? 
values. Points with test R? values less than —-0.5, which indicate that model 
predictions were much worse than random guessing, are not shown. 


zone of trees, could be the major control onthe hydrological effects of 
tree removal. Second, given that forest planting and forest removal both 
alter the same variable (forest cover), but in opposite directions, it was 
hard to reconcile the paper’s two main findings!: that potential storage 
isthe dominant control onstreamflow response to forest clearing (but 
not planting), and that actual evapotranspiration (AET) is the dominant 
control on streamflow response to forest planting (but not clearing). 
Closer examination reveals that the apparent importance of poten- 
tial storage relies on one extreme data point (the Lemon catchment, 
Australia), which has a potential storage of 15 m, more than twice the 
next-highest value in the dataset. If we remove this one data point, 
potential storage disappears as the most important factor (Table 2), 
and is replaced by potential evapotranspiration (PET). This one data 
pointis so influential because Evaristo and McDonnell’s analysis! uses 
an ‘independent uniform’ variable importance profiler. This profiler 
is intended for use where the likely values of each variable will be uni- 
formly distributed over the range of the data’®, which is inconsistent 
with the strongly skewed distributions of potential storage in Evaristo 
and McDonnell’s paired watershed dataset (Fig. 2a) and in their global 
catchment database (Fig. 2b). Potential storages exceeding 7.5m com- 
prise only 0.6% of Evaristo and McDonnell’s paired watershed dataset 
(light blue bars, Fig. 2a) and 6% of their global catchment database 
(light blue bars, Fig. 2b), but 50% of the distribution used to calculate 
the influence of potential storage, exaggerating its importance. 
Although Evaristo and McDonnell fully documented their choice of 
this “independent uniform’ profiler’, other choices, more consistent 
with the available data, lead to a different conclusion. For example, if 
we instead usea profiling method that takes into account the actual dis- 
tributions of all of the variables (“independent resampled” profiling), 
PET becomes the most important variable, and potential storage drops 
to fourth place (Table 2). And if the profiling method also takes account 
of the correlations among the variables, in addition to their actual 
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Table 2 | Relative variable importance using different profilers 


Profiling method and Potential Runoff Drainage Potential Actualevapotran- Rootzone Permeability 
treatment of Lemon evapotran- coefficient area storage spiration storage 

catchment spiration 

Independent uniform 

Lemon included 0.317 (2) 0.098 (3) 0.036 (5) 0.508 (1) 0.041 (4) 0.007 (6) 0.000 (7) 
Lemon omitted 0.500 (1) 0.056 (4) 0.031 (5) 0.299 (2) 0.179 (3) 0.001 (6) 0.001 (6) 
Independent resampled 

Lemon included 0.642 (1) 0.114 (3) 0.165 (2) 0.094 (4) 0.030 (5) 0.005 (6) 0.000 (7) 
Lemon omitted 0.710 (1) 0.077 (4) 0.134 (2) 0.091 (3) 0.050 (5) 0.001 (6) 0.003 (7) 
Dependent resampled 

Lemon included 0.440 (1) 0.189 (2) 0.171 (3) 0.137 (5) 0.109 (6) 0.155 (4) 0.095 (7) 
Lemon omitted 0.433 (1) 0.180 (2) 0.174 (3) 0.129 (5) 0.102 (6) 0.161 (4) 0.098 (7) 


Relative importance scores for each of the seven variables in Evaristo and McDonnell's forest removal model' are shown for three different profiling methods, and for including and excluding 
the Lemon catchment (see text). Ranks are shown in parentheses. The most important variable in each case is highlighted in bold. 


distributions (“dependent resampled” profiling), the most important 
variable is again PET, and potential storage drops to fifth place out of 
seven variables (regardless of whether we include or exclude the Lemon 
catchment; see Table 2). 


Exaggerated global streamflow implications 


To estimate the potential impact of forest clearing on global streamflow 
(table 1 of ref. '), Evaristo and McDonnell first applied their boosted-tree 
model to a database of 442,319 catchments for which the required seven 
input variables are available (whether or not they are actually forested). 
Evaristo and McDonnell then multiplied the median of the modelled 
percentage change in streamflow for each continent’s catchments by 
the average continental river flow (see Table 3). Because less than 30% 
of Earth’s land area is forested’, however, the potential percentage 
increase in streamflow from forest clearing should not be applied to 
the entire continental runoff; that is, one cannot clear forests from 
the 70% of Earth’s land surface where no forests exist. Evaristo and 
McDonnell’s calculation’ implicitly assumes that Earth’s entire land- 
mass is forested, and leads to unrealistic results. For example, under 
Evaristo and McDonnell’s median scenario’, their table 1 implies that 
total post-clearing runoff in Asia would be 95% of total Asian precipi- 
tation’ (32,140 km? yr7; Table 3), arunoff ratio that is rarely observed 
even in urban areas. For Australia and Oceania, the results in Evaristo 
and McDonnell’s' table 1 violate conservation of mass, with total post- 
clearing runoff (1,970 km? yr7+5,412 km? yr‘ =7,382 km yr“) exceeding 
total precipitation’ (6,405 km? yr‘). 

Distributed over the roughly 40 million square kilometres of the 
Earth’s surface that is actually forested’, Evaristo and McDonnell’s 


161 forest removal 
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Fig. 2| Distributions of potential storage, compared tothe uniform 
distribution used to estimate its influence in Evaristo and McDonnell’s 
analysis‘. a, Distribution of potential storage in Evaristo and McDonnell’s 
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claimed global streamflow increase’ of 34,098 km? yr“ implies an 
average of 850 mm yr‘ more streamflow from cleared forest lands. 
This value exceeds the streamflow increases that were measured in 
every one of the 95 paired watershed studies reviewed by Stednick’, 
and exceeds their average by a factor of five. 

Back-of-the-envelope calculations suggest different conclusions. Glob- 
ally, evapotranspiration from forests is roughly 250 mm yr ‘greater than 
from croplands or grasslands”, and multiplying this difference by the 
40 million square kilometres of global forests’ yields a rough estimate of 
10,000 km? yr“, less than one-third of Evaristo and McDonnell’s' result. 
Even this may be an overestimate, because the lower evapotranspiration 
rates of grasslands partly reflect the fact that they often occur in drier 
climates; thus the difference between forest and grassland evapotran- 
spiration may exaggerate the effects of converting forests to grasslands. 


Concluding remarks 


Evaristo and McDonnellare valued colleagues of ours, and we greatly appre- 
ciate their transparency in making their dataand codes available, without 
which theissues described here would have been much harder to diagnose. 
We agree with them that streamflow response to forest management is 
an important issue that deserves a comprehensive analysis, including 
subsurface catchment characteristics as potential explanatory variables. 

Readers should also keep in mind that this is not a purely academic 
exercise. How much, and under what conditions, forests should be 
cleared is animportant policy question with wide-ranging consequences 
for economies, societies and ecosystems. In that regard, we are con- 
cerned that the conclusion that “forest removal can lead to increases 
in streamflow that are around 3.4 times greater than the mean annual 
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dataset of 161 paired watershed studies. b, Distribution of potential storage in 
Evaristo and McDonnell’s database of over 400,000 catchments worldwide. 


Table 3 | Modelled effects of forest cover change on continental runoff 


Region Totalriver Changeinrunoffinresponseto Totalriverrunoff Total Change inrunoffinresponseto Median water yield in 
runoff forest-cover change*(km* yr") after removal precipitation forest-cover change (%)¢ complete catchment 
(km? yr*)? (km? yr“)? (km? yr“) dataset (%)° 
Planting Removal Planting Removal Planting Removal 
Africa 4,320 -605(1,944) 8,986(5,616) 13,306 20,780 -14.0(45.0) 208.0(130.0) -14(45) 208(130) 
Asia 14,550 -1,979(5,835) 16,062(25,783) 30,612 32,140 -13.6(40.1) 110.4(177.2) -14(40) 110(177) 
Australia and 1,970 -412(725) 5,412(4,962) 7,382 6,405 -20.9(36.8) 274.7(251.9) -21(36) 275(252) 
Oceania 
Europe 3,240 -875(1,102) 813(1,426) 4,053 7165 -27.0(34.0) 25.1(44.0) -27(34) 25(44) 
North and Central 6,200 -806(2,034) 918(2,102) 7M8 13,910 -13.0(32.8) 14.8(33.9) -13(33) 15(34) 
America 
South America 10,420 0(3,751) 1,908(17,559) 12,328 28,355 0.0(36.0) 18.3(168.5) 0(36) 18(168) 
Totals 40,700 -4,676 34,098 74,799 109,755 


Values with parentheses are medians (and interquartile ranges). 

*From table 1 of ref. '. 

’Sum of total river runoff and median change due to removal. 

“Total precipitation from ref. ®, which is also the original source of the total river runoff values. 
4Median and IQR of runoff changes, as percentage of total river runoff. 


°Median and IQR of water yield predictions (each rounded to the nearest percentage point in the published database) for Evaristo and McDonnell's 442,319 ‘complete’ catchments. These agree 
within roundoff error with the percentages calculated by dividing the change in runoff by the total runoff for each continent. This agreement demonstrates that the changes in runoff shown in 
table 1 of ref. ' were calculated by multiplying the median (and IQR) of the percentage water yield predictions by the total river runoff, rather than by the runoff from forested areas. 


runoff of the Amazon River” is overstated and could be misinterpreted. 
The Amazon flows continuously, but the streamflow benefits of forest 
clearing are transient, typically lasting only a few years, or at most dec- 
ades, after felling”. One must also keep in mind that the water transpired 
by vegetation is animportant source of precipitation farther downwind, 
estimated to account for roughly 40% of continental precipitation”. 
Thus, sustained large-scale clearing of forests would predictably lead 
to precipitation decreases and drying of continental interiors, although 
the precise magnitude of this effect remains difficult to constrain. 


Data availability 


Allofthe data analysed here are available as described in the data availabil- 
ity and code availability statements of ref.!, or from the cited references. 
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Planting and removal of forest affect average streamflow (also referred 
to as water yield), but there is ongoing debate as to what extent this 
long-term difference between precipitation and evapotranspiration is 
modulated by local conditions. A recent paper by Evaristo and McDon- 
nell’introduces a conceptual vegetation-to-bedrock model to explain 
variability in reported streamflow responses to changes in forest cover 
based onan analysis of seven factors that describe climate, soil proper- 
ties and catchment size. Their analysis excludes well known controls— 
such as the percentage of catchment area under change’, forest type 
and time since afforestation—that we show here to be important. By 
excluding these primary controls, Evaristo and McDonnell risk attribut- 
ing water yield response to co-varying secondary controls rather than 
to the underlying causes. 

We illustrate the importance of the record length (or time since 
afforestation) using unique longterm measurements of water yield 
made under controlled conditions. At Castricum in The Netherlands, 
and St Arnold in Germany, two large lysimeters were planted with 
coniferous and deciduous trees in the 1940s and 1960s, respectively, 
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Fig. 1| Impact of forest age on water yield response to forest planting. Data 
points are from coniferous (triangles) and deciduous (circles) lysimeters at 
Castricum (green) and St Arnold (red/orange). Dashed curves indicate 
exponential fits with a characteristic timescale 7 of 15 years, with a10-year shift 
assumed for the deciduous lysimeter in St Arnold. Letters A, Band C indicate 
record length (or forest age) domains used in Fig. 2. The background histogram 
shows the distribution of the record length of the forest planting studies used 
by Evaristo and McDonnell. Note that most studies (82%) havea record length 
of less than 30 years, and strong changes in water yield response are observed 
in this period. This figure and Fig. 2 were generated by Matlab 2015b (http://nl. 
mathworks.com/products/matlab/). 


while reference conditions (bare soil and grassland, respectively) 
were maintained in an additional lysimeter. At both stations, strong, 
consistent and continuing declines in average water yield response 
were observed over averaging periods that ranged from several years 
up to the whole experiment duration (Fig. 1), coinciding with a steady 
increase in tree height and biomass*” and in spite of possible limita- 
tions in rooting depth. The declines follow an exponential decay (with 
acoefficient of determination of 0.91 or larger) with an e-folding time 
tof 15 years and a stronger water yield response for coniferous forest 
than for deciduous forest. Asa result, each individual lysimeter already 
covers arange in water yield response of 30% up to 70%, comparable 
to the total range reported by Evaristo and McDonnell across differ- 
ent watersheds!. Similar response times were found for afforestation 
experiments in deciduous broadleaf forest in North Carolina in the 
USA* and at the German lysimeter station of Britz-Eberswalde®, while 
analysis of longterm streamflow data in Sweden revealed similar strong 
effects of forest biomass and age’. 

The record length of the studies used by Evaristo and McDonnell! 
varies considerably from1 year to 75 years, but is mostly lower than the 
timescale of water yield response to forest growth of 15 years (Fig. 1). 
Therefore, it is likely that the values reported in studies with record 
lengths of up to once or even twice the e-folding time (15-30 years) are 
in fact highly sensitive to the length of their record. The mixing of data 
with variable record lengths could explain why Evaristo and McDonnell 
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Fig. 2| Global tree canopy cover change distribution and record length of 
water yield response to forest planting. Points/circles indicate locations of 
forest planting studies used by Evaristo and McDonnell’, with the size 
reflecting the record length according to classes A, Band C as indicated in 

Fig. 1. The background map shows changes in tree canopy cover over the period 
1982-2016 obtained from arecent analysis of satellite data®. 
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find actual evapotranspiration (AET) to be the factor explaining most of 
the magnitude, rather than timing, of water yield response to planting. 
When the location of stations with sufficient record length are added 
to a global map of changes in forest cover over the recent decades°, 
it becomes clear that accurate observations of longterm impacts of 
forest planting on water yield are concentrated in only a few regions. 
Strikingly, the forest cover change hotspots are observational blind 
spots for water cycle impacts. Given the potential of large-scale affor- 
estation to offset carbon emissions’, a robust understanding of the 
hydrological impacts of current and future forest management is more 
important than ever. 


Reporting summary 


Further information on research design is available in the Nature 
Research Reporting Summary linked to this paper. 


Data availability 


Five-year-average water yield observations used in the analysis are 
provided in Extended Data Table 1. 
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Matters arising 


Site 

St. Arnold 
St. Arnold 
St. Arnold 
St. Arnold 
St. Arnold 
St. Arnold 
St. Arnold 
St. Arnold 
St. Arnold 
St. Arnold 
Castricum 
Castricum 
Castricum 
Castricum 
Castricum 
Castricum 
Castricum 
Castricum 
Castricum 
Castricum 
Castricum 


Castricum 


Period 


1966-1970 
1971-1975 
1976-1980 
1981-1985 
1986-1990 
1991-1995 
1996-2000 
2001-2005 
2006-2010 
2011-2013 
1941-1945 
1946-1950 
1951-1955 
1956-1960 
1961-1965 
1966-1970 
1971-1975 
1976-1980 
1981-1985 
1986-1990 
1991-1995 
1996-1997 
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p 


932.04 
677.28 
676.94 
773.34 
791.16 
872.9 
813.24 
835.98 
799.86 
703.43 
790.2 
791.4 
835.4 
857.6 
873.4 
910.8 
762.2 
783.6 
891.8 
848.8 
933.8 
744 


Extended Data Table 1| Observed water yield at long-term lysimeter stations 


Reference 


496.4 
357.24 
346.38 
439.62 
442.96 
530.08 
376.56 
391.74 
333.92 
253.97 

590.4 

596.4 

631.4 

664.4 

663.6 

700.2 

546 
597 
682 
657.2 
135.6 
550 


Broadleaf 


484.02 
340.44 
271.86 
334.94 
252.58 
328.16 
181.38 
153.74 
133.68 
130.6 
533.2 
433.4 
374 
339.4 
367.4 
366 
230.6 
270 
341.2 
361.2 
378.6 
145.5 


Deciduous 


441.2 
191.8 
127.54 
198.8 
173.78 
276.72 
140.42 
171.48 
141.1 
NaN 
540.8 
351 
208.2 
190.6 
204 
175.6 
87.75 
122.8 


Precipitation data are shown as reference. The reference lysimeter is grassland at St Arnold and bare soil at Castricum. Data after 2007 were not considered for the lysimeter with deciduous 
forest at St Arnold owing to storm damage caused by cyclone Kyrill. All units are millimetres per year. 


natureresearch er rn 


Last updated by author(s): Nov 14, 2019 


Reporting Summary 


Nature Research wishes to improve the reproducibility of the work that we publish. This form provides structure for consistency and transparency 
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Statistics 


For all statistical analyses, confirm that the following items are present in the figure legend, table legend, main text, or Methods section. 


n/a | Confirmed 


x The exact sample size (n) for each experimental group/condition, given as a discrete number and unit of measurement 
x A statement on whether measurements were taken from distinct samples or whether the same sample was measured repeatedly 
z The statistical test(s) used AND whether they are one- or two-sided 


Only common tests should be described solely by name; describe more complex techniques in the Methods section. 


A description of all covariates tested 
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x A description of any assumptions or corrections, such as tests of normality and adjustment for multiple comparisons 
r Ol A full description of the statistical parameters including central tendency (e.g. means) or other basic estimates (e.g. regression coefficient) 
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Our web collection on statistics for biologists contains articles on many of the points above. 


Software and code 


Policy information about availability of computer code 


Data collection No data was collected for this study 


Data analysis Graphs were produced in MATLAB 


For manuscripts utilizing custom algorithms or software that are central to the research but not yet described in published literature, software must be made available to editors/reviewers. 
We strongly encourage code deposition in a community repository (e.g. GitHub). See the Nature Research guidelines for submitting code & software for further information. 


Data 


Policy information about availability of data 
All manuscripts must include a data availability statement. This statement should provide the following information, where applicable: 


- Accession codes, unique identifiers, or web links for publicly available datasets 
- A list of figures that have associated raw data 
- Adescription of any restrictions on data availability 


Five-year average water yield observations used in the analysis are provided in Extended Data Table 1. 
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Ecological, evolutionary & environmental sciences study design 


All studies must disclose on these points even when the disclosure is negative. 


Study description Comment on previous paper, with simple analysis of decades-old data that has been used in many previous studies 
Research sample Data comes from fixed lysimeters stations, one for each land cover type 

Sampling strategy No sampling involved 

Data collection AT the stations, observations have been done continuously for decades 


Timing and spatial scale Stations have operated for decades, size of each lysimeter approximately 400 m2 


Data exclusions Data after 2007 were not considered for the lysimeter with deciduous forest at St. Arnold due to storm damage caused by Cyclone 
Kyrill. 

Reproducibility n/a 

Randomization n/a 

Blinding n/a 

Did the study involve field work? Yes X|No 
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Corrections & amendments 


Author Correction: 
Structural basis for the 
drug extrusion 
mechanism by a MATE 
multidrug transporter 


https://doi.org/10.1038/s41586-019-1762-6 


Correction to: Nature https://doi.org/10.1038/nature12014 


Published online 27 March 2013 


Yoshiki Tanaka, Christopher J. Hipolito, Andrés D. Maturana, 

Koichi Ito, Teruo Kuroda, Takashi Higuchi, Takayuki Katoh, 

Hideaki E. Kato, Motoyuki Hattori, Kaoru Kumazaki, 

Tomoya Tsukazaki, Ryuichiro Ishitani, Hiroaki Suga & Osamu Nureki 


Inthis Article, three of the images in Fig. 2b are incorrect. Inthe growth 
complementation tests of AacrB strains, all 14 lower panels, from (—) 
for +Norflaxin to M206A for +Norfloxacin, should have been cropped 
from the original plate images. However, during the preparation of Fig. 
2b we inadvertently cropped the lower panels for Y139A, N157A and 
N180A from the wrong plates (P26A, M206A and MI173A, respectively). 
Figure 1 shows the incorrect, as-published original Fig. 2b and the cor- 
rected Fig. 2b, with the three affected lower panels (Y139A, N157A and 
N180A for +Norflaxin) now corrected. Since all the growth scores for 
both the corrected and the incorrect, as-published panels are negative 
(that is, marked ‘~’ for not complemented), these errors do not change 
our conclusion for the evaluation of the mutation effects. The original 
Article has not been corrected online. 


Incorrect 


2b = acrBt AacrB 


a) a) WT P26A D41A D41N Y139A N157A 


M173A S177L N180A D184A D184N M206A PfMATE 


— Norfloxacin 

7 4h a a 
+ Norfloxacin 
0.02 ug mit 


Corrected 


2b acrB* AacrB 


M173A S177L N180A D184A D184N M206A PfMATE 


. — Norfloxacin 
+p “E ea + or + 

+ Norfloxacin 

0.02 ug mi" 


Fig.1| This figure shows the incorrect, as-published original Fig. 2b and the corrected Fig. 2b. 
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Corrections & amendments 


Publisher Correction: 
Recycling lithium-ion 
batteries from electric 
vehicles 


https://doi.org/10.1038/s41586-019-1862-3 


Correction to: Nature https://doi.org/10.1038/s41586-019-1682-5 


Published online 06 November 2019 


Gavin Harper, Roberto Sommerville, Emma Kendrick, Laura Driscoll, 
Peter Slater, Rustam Stolkin, Allan Walton, Paul Christensen, 

Oliver Heidrich, Simon Lambert, Andrew Abbott, Karl Ryder, 

Linda Gaines & Paul Anderson 


In this Review Article, owing to a mistake in renumbering, there were 
several errors in the author affiliations in the HTML version. The PDF 
and print versions were correct. The errors have been corrected online. 
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Corrections & amendments 


Retraction Note: 
Global analysis of 
streamflow response to 
forest management 


https://doi.org/10.1038/s41586-020-1945-1 


Retraction to: Nature https://doi.org/10.1038/s41586-019-1306-0, 
published online 17 June 2019; corrected online 30 September 2019; 
addendum 30 September 2019 


Jaivime Evaristo & Jeffrey J. McDonnell 


A few weeks after publication of this Article, as a result of comments 
from James Kirchner and colleagues, we realized that our assembled 
dataset of paired watershed studies, used to assess the streamflow 
response to forest removal and planting, contains errors inthe percent- 
age change in streamflow associated with land cover modifications. 
Second, the effects of continent-wide forest removal on streamflow 
(Table 1 of the Article) are overestimated, because we assumed a start- 
ing condition of 100% forest cover. Third, there are serious concerns 
regarding model validation that need to be assessed using the cor- 
rected data. Correcting these honest mistakes goes beyond a simple 
Author Correction, and therefore we and the Nature editors wish to 
retract this Article. There are two Matters Arising that accompany this 
Retraction Note, by James W. Kirchner et al. (https://doi.org/10.1038/ 
$41586-020-1940-6) and by Adriaan)J. Teuling & Anne). Hoek van Dijke 
(https://doi.org/10.1038/s41586-020-1941-5). We have decided not to 
respond because we do not wish to cause further confusion by defend- 
ing a retracted paper. We are working with Kirchner and colleagues 
to constructively address the issues raised in a revised paper; if and 
when itis published we will alert readers by posting acomment to this 
Retraction. 
Correspondence should be addressed toJ.J.M. 
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Sports and science careers might be vastly different — but both can trigger an identity crisis. 


AN ACADEMIC 
IDENTITY CRISIS 


Overdoing PhD work can lead to loss of identity. 
Three things help recover it. By Robert Seaborne 


uring the final 18 months of my PhD 
programme, I became incredibly 
absorbed in my work. For months 
on end, I could be found toiling 
in the laboratory or writing in an 
office for 13-14 hours per day. Evenings and 
weekends that I once spent playing football, 
going to the gym or socializing were instead 
used to work on my experiments, read, write 
or analyse data. I became obsessed with my 
project. Every waking moment was spent 
furthering my studies. Every conversation | 
had revolved around my work. I had become 
the living embodiment of my PhD, and com- 
pletely lost my sense of self. | had assumed a 
new identity: one that centred on my degree 
programme. 
Identity crises are neither a new nor a 


unique phenomenon. Elite athletes, for 
example, are particularly susceptible to 
them’, and these events have severe psycho- 
logical and performance-related effects. 
It’s easy to imagine why: the life of an ath- 
lete is the relentless pursuit of perfection 
in an extremely volatile environment. 
That promotes extreme dedication, anda 
win-at-all-costs mentality. 

Research suggests that athletes who 


“Over time, Ihave slowly 
started to gain back an 
identity that I once lost 
tomy PhD.” 


© 2020 Springer Nature Limited. All rights reserved 


identify entirely as athletes, as opposed 
to those who see being an athlete as only a 
facet of their personality, are at greater risk 
of mental-health damage when this iden- 
tity is challenged, under threat’ or removed 
entirely. These individuals have effectively 
built an entire identity around one compo- 
nent of their being. And when this identity 
is challenged or becomes strained, the indi- 
vidual perceives the threat as an attack or 
criticism of their entire person, leaving them 
psychologically and emotionally fragile. This 
is most strikingly seen in elite athletes who are 
forced to retire; this process effectively strips 
them of the one identity they have associated 
with for many years”. 

Elite sport and academia might seem like 
two completely distant worlds, but I think 
they are similar when it comes to their ability 
to trigger an identity crisis. Both are highly 
intensive, performance-driven, turbulent 
careers, with too many candidates trying to 
‘make it’ compared with the number of places 
available. 

My ownidentity had becomeentirely defined 
by my PhD work, and I had created a personal- 
ity defined by just one aspect of my life. When 
this was under threat and challenged by poor 
results or failed experiments, | interpreted 
these outcomes as evidence that my entire 
identity was a failure or was insufficient. Con- 
sequently, my emotional and psychological 
outlook ebbed and flowed to the rhythm of 
my PhD. During the highs, I was motivated, 
excited and passionate about life. But during 
the lows, I became irritable, aggressive and 
both physically and mentally drained. I was 
unstable and unhappy. 

I graduated towards the end of 2018, and 
it has taken me a full year to truly discover, 
understand and reflect on what this identity 
crisis was, how it affected me and what mech- 
anisms helped me to overcome it. Identifying 
and developing these coping strategies was 
crucial, and would have served me very well 
had I been advised of these tactics early in my 
studies. Here! describe three mechanisms that 
worked for me, in the hope that they might 
benefit those who are currently in, or who 
might encounter, a similar scenario. 


Exercise 


Sport has always been a huge part of my life, 
but was something that I had lost during 
the intense periods of my PhD programme. 
Following the successful defence of my dis- 
sertation, I suddenly had a lot of spare time 
at weekends and evenings. So I decided to 
restart my outdoor exercise habits. I joined 
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a local football team and a gym, and I began 
recreationally going rock climbing and 
playing tennis. Committing to exercise and 
competitive sport again has helped me to have 
another element of my life to focus on outside 
academia. It gives mea lot of perspective, and 
helps me to counterbalance the challenges | 
face during my research career. 


Sleep 


During the most intense periods of my PhD 
programme, | prioritized my work over 
everything else — including getting enough 
sleep. Your mind works in a much more 
efficient and productive manner if you are 
getting sufficient amounts of quality sleep. 
With this comes a better ability to interpret, 
process and deal with challenges at both the 
emotional and psychological level. 


Reading 


As researchers, we tend to be inquisitive 
and eager to learn. I realized that | if was to 
try to resolve my psychological state, then I 
needed to understand the issue. And so, I read. 
Iread books about how to control the mind** 
through to ones about the habits of highly 
successful chief executives*, businesses® 
and past and present sporting greats’*®. They 
helped meto learn alittle about how the mind 
works, and how I can better control my own. 

As aresult, I slowly began to feel more at 
ease with my thought processes, and began to 
understand more about who I was. Over time, 
I have slowly started to gain back an identity 
that I once lost to my PhD. 

Maintaining your personal identity in a 
career that is highly volatile, stressful and 
intense is difficult, and your sense of self 
can so easily be lost. However, it is crucial 
to differentiate yourself from your work 
in order to maintain both your mental and 
physical health. It is important to understand 
that successes and failures in your research 
career do not and should not define who you 
are. You area person long before you're a PhD 
researcher. 


Robert Seaborne is a postdoctoral researcher 
at Queen Mary University of London. 
e-mail: r.sseaborne@qmul.ac.uk 
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MEN SELF-HYPE 
THEIR PAPERS 


Sensationalistic words attract citations — and 
men more often use them. By Chris Woolston 


language analysis of titles and 

abstracts in more than 100,000 

scientific articles found that papers 

with both first and last authors who 

were women were about 12% less 
likely than male-authored papers to include 
sensationalisticterms suchas ‘unprecedented’, 
‘novel, ‘excellent’ or ‘remarkable’. The study, 
published in The BM/’, also found that papers 
missing such words garnered significantly 
fewer citations. 

Researchers tracked 25 positive terms in 
clinical-research articles published between 
2002 and 2017, and input the authors’ names 
into the Genderize database to predict their 
genders. The team then created models 
that compared the citation rates and word 
choice of articles published in the same 
journals in the same year with the same sub- 
ject keywords. 

The articles in each comparison were 
presumably of similar quality, but those that 
had positive words in their title or abstract 


“Islanguage a mirror of 
society, or doesit shape 
society?” 


garnered 9% more citations overall, and 13% 
more citations in high-impact journals. 

The relative reluctance of female authors 
touse self-flattering words could contribute to 
agender gap in citations and impact, says lead 
author Marc Lerchenmueller, an economist 
at the University of Mannheim in Germany 
and the Yale School of Management in New 
Haven, Connecticut. In the big picture, 
he adds, these results should encourage 
scientific authors and editors to think about 
word choice andits effects. “Scientists should 
discuss whether using such sales terms 
is a disservice to the scientific enterprise,” 
he says. 


Anincreasing practice 


The discussion seems to be becoming more 
important: the analysis also found that 
such self-flattering words were 80% more 
common in 2017 than they were in 2002. 
Lerchenmueller notes that this time period 
marked an explosion in the number of pub- 
lished articles. “Authors are trying to present 
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research as favourably as possible to attract 
attention,” he says. 

At this point, it’s impossible to pinpoint 
exactly why male and female authors would 
take a different approach to promotional 
language, Lerchenmueller adds. He points to 
decades of studies suggesting women are more 
likely than men to face a backlash from peers 
and society when they stray beyond stereotyp- 
icalnorms. Women who have been chastised in 
the past for being too forceful or boastful might 
edit themselves and tone down their language, 
he says. Sensationalistic words could also 
be added or removed at some point during 
the editorial process— and Lerchenmueller 
thinks that this possibility warrants closer 
examination. 


Theimpact of words 


This relative lack of inflated language in 
female-authored papers echoes a 2019 
experimental study published by the National 
Bureau of Economic Research2, showing 
that women gave themselves relatively poor 
marks in interviews, performance reviews, 
job applications and other settings. “We 
found a large and robust gender gap in 
self-promotion,” says Christine Exley, who 
is a business-administration researcher at 
Harvard Business School in Boston, Massachu- 
setts. In one measure, women were less likely 
to describe their performance favourably 
when selecting from a list of potential adjec- 
tives that ranged from ‘terrible’ to ‘excellent’. 
Exley notes that in an experimental setting, 
women should have felt no fear of backlash 
for over-hyping themselves — but the gender 
gap still persisted. 

Lerchenmueller feels that his study touches 
on some important philosophical questions 
about the power and meaning of words. “Is 
language a mirror of society, or does it shape 
society?” In the world of science, he says, 
language seems to both reflect and promote 
bias — and female researchers are facing the 
consequences. 


Chris Woolston is a freelance writer in Billings, 
Montana. 


1. Lerchenmueller, M. J., Sorenson, O. & Jena, A. B. Br. Med. J. 
367, 16573 (2019). 

2. Exley, C. L. & Kessler, J. B. NBER working paper 26345 
(2019). 
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at. Sleep. STEM. Repeat: these 
words on my T-shirt are the mantra 
of Stemettes, a UK-based outreach 
enterprise I co-founded in 2013 that 
encourages girls and young women 
to enter careers in science, technology, 
engineering and mathematics (STEM). 

I’m acomputer scientist and have worked 
for firms including Goldman Sachs, Deutsche 
Bank and Hewlett-Packard. But I decided to 
launch this business because! wanted to havea 
wider impact. If can inspire more girls to build 
something for themselves, that’s even more 
important than me making another algorithm 
or widget. 

lespecially love events like the one 
pictured here, when and 7 other 
‘Stemettes‘ spent the day with 200 girls 
between the ages of 15 and 19. We gathered 
at G-Research, a data and technology 
company in central London that hires 
people with PhDs to work with maths and 
algorithms. The girls spoke to the audience 
about things they’re passionate about, 
and had mock interviews with company 
employees. This all builds confidence, 
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and you can tell when girls have a eureka 
moment. That’s what we’re there for. 

We also run events with banks, energy 
companies and the UK National Health 
Service. The girls see that all sorts of people 
work at such places, including women like 
me.I don’t have to be super corporate or 
change my hair or the way speak to do my 
job. I’m wearing trainers. | can be authentic, 
and they can, too. We show the girls that you 
don’t have to be a maths genius to work in 
tech. Digital literacy shouldn't be elitist. 

Nearly everyone who signs up for 
our events, whether through school or 
individually, is female or non-binary. It’s 
something we mandate — girls tend to be 
more open without a bunch of teenage boys 
around. Technology hasn’t always had the 
positive impact on the world, the workforce 
or our daily lives that it could have. Maybe 
it’s because we don’t always have the right 
people inthe room. 


Anne-Marie Imafidon is the co-founder and 
chief executive of Stemettes in London, UK. 
Interview by Chris Woolston. 


